home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-08-08 | 225.1 KB | 5,122 lines |
-
- Linux System Administrator's Guide 0.3
-
-
- Linux System Administrator's Guide 0.3
-
-
- Lars Wirzenius
- The Linux Documentation Project
-
- This is version 0.3 of the Linux System Administrators' Guide.
-
- Published August 6, 1995.
-
- The LATEX source code and other machine readable formats can be found on the
- Internet via anonymous ftp on sunsite.unc.edu, in the directory
- /pub/Linux/docs/LDP.
-
- Also available are Postscript and TEX .DVI formats, and possibly a plain text
- version (to be released after the other formats). HTML versions may also be
- forthcoming.
-
- Copyright cO1993, 1995 Lars Wirzenius.
-
- Hernesaarenkatu 15 A 2, Fin-00150 Helsinki, Finland, lars.wirzenius@helsinki.fi.
-
- UNIX is a trademark of Novell, Inc. Linux is not a trademark, and has no
- connection to UNIX TM or Novell.
-
- Permission is granted to make and distribute verbatim copies of this manual
- provided the copyright notice and this permission notice are preserved on all
- copies.
-
- Permission is granted to process the document source code through TEX or other
- formatters and print the results, provided the printed document carries copying
- permission notice identical to this one.
-
- Permission is granted to copy and distribute modified versions of this manual
- under the conditions for verbatim copying, provided that the entire resulting
- derived work is distributed under the terms of a permission
- notice identical to this one.
-
- Permission is granted to copy and distribute translations of this manual into
- another language, under the above conditions for modified versions, except that
- this permission notice may be stated in a translation approved by the Free
- Software Foundation.
-
- The Free Software Foundation may be contacted at:
-
-
- 59 Temple Place Suite 330
- Boston, MA 02111-1307 USA
-
-
- The appendices not written by Lars Wirzenius are copyrighted by their authors,
- and can be copied and distributed only in unmodified form.
-
- The author would appreciate a notification of modifications, translations, and
- printed versions. Thank you.
-
-
-
- This page is dedicated to a future dedication.
-
-
-
- Contents
-
- 1 Introduction 5
-
- 1.1 The Linux Documentation Project : : : : : : : : : : : : : : : 8
-
-
- 2 Overview of a Linux System 9
-
-
- 2.1 Various parts of an operating system : : : : : : : : : : : : : : 9
- 2.2 Important parts of the kernel : : : : : : : : : : : : : : : : : 10
- 2.3 Major services in a UNIX system : : : : : : : : : : : : : : : : : 11
- 2.4 The filesystem layout : : : : : : : : : : : : : : : : : : : : : : 15
-
- 3 Boots And Shutdowns 17
-
- 3.1 An overview of boots and shutdowns : : : : : : : : : : : : : : 17
- 3.2 The boot process in closer look : : : : : : : : : : : : : : : 18
- 3.3 More about shutdowns : : : : : : : : : : : : : : : : : : : : : 21
- 3.4 Rebooting : : : : : : : : :: : : : : : : : : : : : : : : : : : 22
- 3.5 Single user mode : : : : : : : : : : : : : : : : : : : : : : : 23
- 3.6 Emergency boot floppies : : : : : : : : : : : : : : : : : : : : 23
-
- 4 Using Disks and Other Storage Media 25
-
- 4.1 Two kinds of devices : : : : : : : : : : : : : : : : : : : : : 26
- 4.2 Hard disks : : : : : : : : : : : : : : : : : : : : : : : : : : 27
- 4.3 Floppies : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30
- 4.4 Formatting : : : : : : : : : : : :: : : : : : : : : : : : : : 31
-
-
- i
- ii CONTENTS
-
-
-
- 4.5 Partitions : : : : : : : : : : : : : : : : : : : : : : : : : : : : 33
- 4.6 Filesystems : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37
- 4.7 Disks without filesystems : : : : : : : : : : : : : : : : : : : : : 47
- 4.8 Allocating disk space : : : : : : : : : : : : : : : : : : : : : : : 48
-
- 5 Directory Tree Overview 53
-
- 5.1 Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : 53
- 5.2 The root filesystem : : : : : : : : : : : : : : : : : : : : : : : : 55
- 5.3 The /usr filesystem : : : : : : : : : : : : : : : : : : : : : : : : 59
- 5.4 The /var filesystem : : : : : : : : : : : : : : : : : : : : : : : : 60
- 5.5 The /proc filesystem : : : : : : : : : : : : : : : : : : : : : : : 61
-
- 6 Memory Management 63
-
- 6.1 What is virtual memory? : : : : : : : : : : : : : : : : : : : : : : 63
- 6.2 Creating a swap area : : : : : : : : : : : : : : : : : : : : : : : 64
- 6.3 Using a swap area : : : : : : : : : : : : : : : : : : : : : : : : : 65
- 6.4 Sharing swap areas with other operating systems : : : : : : : : : 66
- 6.5 Allocating swap space : : : : : : : : : : : : : : : : : : : : : : 67
- 6.6 The buffer cache : : : : : : : : : : : : : : : : : : : : : : : : 68
-
- 7 Logging In And Out 71
-
- 7.1 Logins via terminals : : : : : : : : : : : : : : : : : : : : : : 71
- 7.2 Logins via the network : : : : : : : : : : : : : : : : : : : : : : 72
- 7.3 What login does : : : : : : : : : : : : : : : : : : : : : : : : : 73
- 7.4 X and xdm : : : : : : : : : : : : : : : : : : : : : : : : : : : : 74
- 7.5 Access control : : : : : : : : : : : : : : : : : : : : : : : : : : 74
- 7.6 Shell startup : : : : : : : : : : : : : : : : : : : : : : : : : : : 75
-
- A Design and Implementation of the Second Extended Filesystem 77
-
- A.1 History of Linux filesystems : : : : : : : : : : : : : : : : : : : 78
- CONTENTS iii
-
-
-
- A.2 Basic File System Concepts : : : : : : : : : : : : : : : : : : : : 79
- A.3 The Virtual File System : : : : : : : : : : : : : : : : : : : : : 82
- A.4 The Second Extended File System : : : : : : : : : : : : : : : : : : 83
- A.5 The Ext2fs library : : : : : : : : : : : : : : : : : : : : : : : 88
- A.6 The Ext2fs tools : : : : : : : : : : : : : : : : : : : : : : : : 89
- A.7 Performance Measurements : : : : : : : : : : : : : : : : : : : : 91
- A.8 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : 93
-
- B Measuring Holes 97
-
- C The Linux Device List 99
-
- C.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : 99
- C.2 Major numbers : : : : : : : : : : : : : : : : : : : : : : : : : : : 100
- C.3 Minor numbers : : : : : : : : : : : : : : : : : : : : : : : : : : : 101
- C.4 Additional /dev directory entries : : : : : : : : : : : : : : : : 115
- iv CONTENTS
-
-
-
-
-
- Introduction to the ALPHA
-
-
-
- Versions
-
-
-
- In the beginning, the file was without form, and void; and
- emptiness was upon the face of the bits. And the Fingers of
- the Author moved upon the face of the keyboard. And the Author said, Let there
- be words, and there were words.
-
-
- This is an ALPHA version of the Linux System Administrators' Guide. That
- means that I don't even pretend it contains anything useful, or that anything
- contained within it is factually correct. In fact, if you believe anything that
- I say in this version, and you are hurt because of it, I will cruelly laugh at
- your face if you complain.
-
- Well, almost. I won't laugh, but I also will not consider myself responsible
- for anything.
-
- The purpose of an ALPHA version is to get the stuff out so that other people
- can look at it and comment on it. The latter part is the important one: Unless
- the author gets feedback, the ALPHA version isn't doing anything good.
- Therefore, if you read this `book', please, please, please let me hear your
- opinion about it. I don't care whether you think it is good or bad, I want you
- to tell me about it.
-
- If at all possible, you should mail your comments directly to me, otherwise
- there is a largish chance I will miss them. If you want to discuss things in
- public (on one of the comp.os.linux newsgroups or the mailing list), that is ok
- by me, but please send a copy via mail directly to me as well.
-
-
- I do not much care about the format in which you send your comments, but it
- is essential that you clearly indicate what part of my text you are commenting
- on. I can be contacted at the following e-mail addresses:
-
-
- lars.wirzenius@helsinki.fi
-
-
- 1
- 2 CONTENTS
-
-
-
- wirzeniu@cc.helsinki.fi
-
- wirzeniu@cs.helsinki.fi
-
- wirzeniu@kruuna.helsinki.fi
-
- wirzeniu@hydra.helsinki.fi
-
- (they're all actually the same account, but I give all these, just in case there
- is some weird problem).
- This text contains a lot of notes that I have inserted as notes to myself. They
- are identified with \META: ". They indicate things that need to be worked on,
- that are missing, that are wrong, or something like that. They are mostly for my
- own benefit and for your amusement, they are not things that I am hoping someone
- else will write for me.
-
-
- If you think that this version of the manual is missing a lot, you are right.
- I am including only those chapters that are at least half finished. New
- chapters will be released as they are written.
- CONTENTS 3
-
-
-
- The LDP Rhyme1
-
-
- A wondrous thing, We started to write,
- and beautiful, or plan, at least,
- 'tis to write, several books,
- a book. one for every need.
-
-
- I'd like to sing, The start was fun,
- of the sweat, a lot of talk,
- the blood and tear, an outline,
- which it also took. then a slew.
-
-
- It started back in, Then silence came,
- nineteen-ninety-two, the work began,
- when users whined, some wrote less,
- "we can nothing do!" others more.
-
-
- They wanted to know, A blank screen,
- what their problem was, oh its horrible,
- and how to fix it it sits there,
- (by yesterday). laughs in the face.
-
-
- We put the answers in, We still await,
- a Linux f-a-q, the final day,
- hoped to get away, when everything,
- from any more writin'. will be done.
-
-
- "That's too long, Until then,
- it's hard to search, all we have,
- and we don't read it, is a draft,
- any-which-way!" for you to comment on.
-
-
- Then a few of us,
- joined toghether
- (virtually, you know),
- to start the LDP.
-
-
- ______________1
- The author wishes to remain anonymous. It was
- posted to the LDP mailing list by Matt Welsh.
- 4 CONTENTS
-
-
-
-
-
- Chapter 1
-
-
-
- Introduction
- I pride myself on the fact that my work has
-
- no socially redeeming value.
-
- (John Waters)
- This manual, the Linux System Administrators' Guide, describes the system
- admin-istration aspects of using Linux. It is intended for people who know next
- to nothing about system administration (as in \what is it?"), but who have
- already mastered at least the basics of normal usage, which means roughly the
- material covered by the Linux Users' Guide. This manual also doesn't tell you
- how to install Linux; that is described in the Installation and Getting Started
- document. There is some overlap between all the Linux Documentation Project
- manuals, but they all look at things from slightly different angles. See below
- for more information about Linux manuals.
-
- What, then, is system administration? It is all the things that one has to do
- to keep a computer system in a useable shape. Things like backing up files (and
- restoring them if necessary), installing new programs, creating accounts for
- users (and deleting them when no longer needed), making certain that the
- filesystem is not corrupted, and so on. If a computer were, say, a house, system
- administration would be called maintenance, and would include cleaning, fixing
- broken windows, and other such things. System administration is not called
- maintenance, because that would be too simple. 1
-
-
- The structure of this manual is such that many of the chapters should be
- usable independently, so that if you need information about, say, backups, you
- can read just
-
- _____________________________1
- There are some people who do call it that, but that's just because they have
- never read this manual, poor things.
-
-
- 5
- 6 Chapter 1. Introduction
-
-
-
- that chapter.2 This hopefully makes the book easier to use as a reference
- manual, and makes it possible to read just a small part when needed, instead of
- having to read everything. However, this manual is first and foremost a
- tutorial, and a reference manual only as a lucky coincidence.
-
-
- This manual is not intended to be used completely by itself. Plenty of the
- rest of the Linux documentation is also important for system administrators.
- After all, a system administrator is just a user with special privileges and
- duties. A very important resource is the man pages, which should always be
- consulted when a command is not familiar.
-
- While this manual is targeted at Linux, a general principle has been that it
- should be useful with other UNIX based operating systems as well.
- Unfortunately, since there is so much variance between different versions of
- UNIX in general, and in system administration in particular, there is little
- hope to cover all variants. Even covering all possibilities for Linux is
- difficult, due to the nature of its development. There is no one official Linux
- distribution, so different people have different setups, many people have a
- setup they have built up themselves. When possible, I have tried to point out
- differences, and explain several alternatives. In order to cater to the hackers
- and DIY types that form the driving force behind Linux development, I have tried
- to describe how things work, rather than just listing \five easy steps" for each
- task. This means that there is much information here that is not necessary for
- everyone, but those parts are marked as such and can be skipped if you use a
- preconfigured system. Reading everything will, naturally, increase your
- understanding of the system and should make using and administering it more
- pleasant.
-
- Like all other Linux related development, the work was done on a volunteer
- basis: I did it because I thought it might be fun and because I felt it should
- be done. However, like all volunteer work, there is a limit to how much effort I
- have been able to spend, and also on how much knowledge and experience I have.
- This means that the manual is not necessarily as good as it would be if a wizard
- had been paid handsomely to write it and had spent a few years to perfect it. I
- think, of course, that it is pretty nice, but be warned.
-
-
- One particular point where I have cut corners is that I have not covered very
- thoroughly many things that are already well documented in other freely
- available manuals. This applies especially to program specific documentation,
- such as all the details of using mkfs(8). I only describe the purpose of the
- program, and as much
-
- _____________________________2
- If you happen to be reading a version that has a chapter on backups, that is.
- 7
-
-
-
- of its usage as is necessary for the purposes of this manual. For further
- informa-tion, I refer the gentle reader to these other manuals. Usually, all of
- the referred to documentation is part of the full Linux documentation set.
-
-
- While I have tried to make this manual as good as possible, I would really
- like to hear from you if you have any ideas on how to make it better. Bad
- language, factual errors, ideas for new areas to cover, rewritten sections,
- information about how various UNIX versions do things, I am interested in all of
- it. You can contact me via electronic mail with the Internet domain address
- lars.wirzenius@helsinki.fi, or
-
- by traditional paper mail using the address
-
-
- Lars Wirzenius / Linux docs
- Hernesaarentie 15 A 2
- 00150 Helsinki
- Finland
-
-
- Many people have helped me with this book, directly or indirectly. I would
- like to especially thank Matt Welsh for inspiration and LDP leadership, Andy
- Oram for igniting an almost dead spark again with much-valued feedback, Olaf
- Kirch for showing me that it can be done, and Adam Richter at Yggdrasil and
- others for showing me that other people can find it interesting as well.
-
- H. Peter Anvin, R'emy Card, Theodore Ts'o, and Stephen Tweedie have let me
- borrow their work (and thus make the book look thicker and much more
- impressive). I am most grateful for this, and very apologetic for the earlier
- versions that sometimes lacked proper attribution. Stephen Tweedie also let me
- borrow his comparison of the xia and ext2 filesystems, but that has since been
- dropped, since xia is no longer very popular.
-
-
- In addition, I would like to thank Mark Komarinski for sending his material
- in 1993 and the many system administration columns in Linux Journal. They are
- quite informative.
-
-
- Thanks to Erik Troan at Red Hat, for promising to make a plain text version
- of this book available.3
-
-
- A minor accusation goes to Linus Torvalds for writing the damn system to
- write about in the first place. That applies for the rest of
- /usr/src/linux/CREDITS as well. Be ashamed, be very ashamed.
-
-
- Many useful comments have been sent by a large number of people. My
- miniatureblack hole of an archive doesn't let me find all their names, but some
- of them are in
- _____________________________3
- Erik, you can color yourself pressurized.
- 8 Chapter 1. Introduction
-
-
-
- alphabetical order: Paul Caprioli, Ales Cepek, Marie-France Declerfayt, Olaf
- Flebbe, Helmut Geyer, Larry Greenfield and his father, Stephen Harris, Jyrki
- Havia, Jim Haynes, York Lam, Timothy Andrew Lister, Jim Lynch, Dan Poirier,
- Daniel Quinlan, Philippe Steindl. My apologies to anyone I have forgotten.
-
- 1.1 The Linux Documentation Project
-
-
-
- The Linux Documentation Project, or LDP, is a loose team of writers,
- proofreaders, and editors who are working together to provide complete
- documentation for the Linux operating system. The overall coordinator of the
- project is Matt Welsh, who is aided by Lars Wirzenius and Michael K. Johnson.
-
-
- This manual is one in a set of several being distributed by the LDP,
- including a Linux Users' Guide, System Administrators' Guide, Network
- Administrators' Guide, and Kernel Hackers' Guide. These manuals are all
- available in LATEX source format, .dvi format, and postscript output by
- anonymous FTP from sunsite.unc.edu, in the directory /pub/Linux/docs/LDP, and
- from tsx-11.mit.edu, in the directory /pub/linux/docs/guides.
-
-
- We encourage anyone with a penchant for writing or editing to join us in
- improving Linux documentation. If you have Internet e-mail access, you can
- contact Matt Welsh at mdw@sunsite.unc.edu.
-
-
- Chapter 2
-
-
-
- Overview of a Linux System
-
-
-
- A quote is needed.
- This chapter gives an overivew of a Linux system. First, the major services
- provided by the operating system are described. Then, the programs that
- implement these services are described with a considerable lack of detail. The
- purpose of this chapter is to give an understanding of the system as a whole, so
- each part is described in detail elsewhere.
-
- 2.1 Various parts of an operating system
-
-
-
- A UNIX operating system consists of a kernel and some system programs. There
- also some application programs for doing work. The kernel is the heart of the
- operating system1. It keeps track of files on the disk, starts programs and
- multiplexes the processor and other hardware between them to provide
- multitasking, assigns memory and other resources to various processes, receives
- packets from and sends packets to the network, and so on. The kernel does very
- little by itself, but it provides tools with which all services can be built. It
- also prevents anyone from accessing the hardware directly, forcing everyone to
- use the tools it provides. This way the kernel can control who gets to do what
- and can provide some protection for users from each other. The tools provided by
- the kernel are used via system calls; see manual page section 2 for more
- information on these.
-
-
- The system programs use the tools provided by the kernel to implement the
- var-
- _____________________________1
- In fact, it is often mistakenly considered to be the operating system itself,
- but it is not. An operating system
- provides many more services than a plain kernel.
-
-
- 9
- 10 Chapter 2. Overview of a Linux System
-
-
-
- ious services required from an operating system. System programs, and all other
- programs, run `on top of the kernel', in what is called the user mode. The
- difference between system and application programs is one of intent:
- applications are intended for getting useful things done (or for playing, if it
- happens to be a game), whereas system programs are needed to get the system
- working. A word processor is an application; telnet is a system program. The
- difference is often somewhat blurry, however, and is important only to
- compulsive categorizers.
-
-
- An operating system can also contain compilers and their corresponding
- libraries (GCC and the C library in particular under Linux), although not all
- programming languages need be part of the operating system. Documentation, and
- sometimes even games, can also be part of it. Traditionally, the oeprating
- system has been defined by the contents of the installation tape or disks; with
- Linux it is not as clear since the stupid thing is spread all over the FTP sites
- of the world.
-
- 2.2 Important parts of the kernel
-
-
-
- The Linux kernel consists of several important parts: process management, memory
- management, hardware device drivers, filesystem drivers, network management, and
- various other bits and pieces. Figure 2.1 shows some of them.
-
- Probably the most important parts of the kernel (nothing else works without
- them) are the memory management and the process management. Memory management
- takes care of assigning memory areas and swap space areas to processes, parts of
- the kernel, and for the buffer cache. Process management creates processes, and
- implements the multitasking by switching the active process on the processor.
-
- At the lowest level, the kernel contains a hardware device driver for each
- kind of hardware it supports. Since the world is full of different kinds of
- hardware, the number of hardware device drivers is large. There are often many
- otherwise similar pieces of hardware that differ in how they are controlled by
- software. The similarities make it possible to have general classes of drivers
- that support similar operations; each member of the class has the same interface
- to the rest of the kernel but differs in what it needs to do to implement them.
- For example, all hard disk drivers look alike to the rest of the kernel, i.e.,
- they all have operations like `initialize the drive', `read sector N', and
- `write sector N'.
-
- Some software services provided by the kernel itself have similar properties.
- For example, the various network protocols have been abstracted into one
- programming interface, the BSD socket library. Another example are the various
- filesystems Linux
-
- 2.3. Major services in a UNIX system 11
-
-
-
- Figure 2.1: Some of the more important parts of the Linux kernel.
- supports: the kernel contains a virtual filesystem (VFS) that contains all the
- op-erations for a filesystem, and a filesystem driver for each supported
- filesystem. When some entity tries to use a filesystem, the request goes via the
- VFS, which routes the request to the proper filesystem driver.
-
- 2.3 Major services in a UNIX system
-
- This section describes some of the more important UNIX services, but without
- much detail. They are described more thorougly in later chapters.
- 12 Chapter 2. Overview of a Linux System
-
-
-
- 2.3.1 init
-
-
-
- The single most important service in a UNIX system is provided by init. init is
- started as the first process of every UNIX system, as the last thing the kernel
- does when it boots. When init starts, it continues the boot process by doing
- various startup chores (checking and mounting filesystems, starting daemons,
- etc).
-
- The exact list of things that init does depends on which flavor it is; there
- are several to choose from. init usually provides the concept of single user
- mode, in which no one can log in and root uses a shell at the console; the usual
- mode is called multiuser mode. Some flavors generalize this as run levels;
- single and multiuser modes are considered to be two run levels, and there can be
- additional ones as well, for example, to run X on the console.
-
-
- When the system is running, the two most important tasks of init is to make
- sure gettys are working (to make sure logins work), that various daemons are
- running, and to adopt orphan processes (processes whose parent has died; in UNIX
- all processes must be in a single tree, so orphans must be adopted).
-
-
- When the system is shut down, it is init that is in charge of killing all
- other processes, unmounting all filesystems and stopping the processor, along
- with anything else that it feels like doing.
-
- 2.3.2 Logins from terminals
-
-
-
- Logins from terminals (via serial lines) and the console (when not running X)
- are provided by the getty program. init starts a separate instance of getty for
- each terminal for which logins are to be allowed. getty reads the username and
- runs the login program, which reads the password. If the username and password
- match, login runs the shell. When the shell terminates, i.e., the user logs
- out, or when login terminated because the username and password didn't match,
- init notices this and starts a new instance of getty. The kernel has no notion
- of logins, this is all handled by the system programs.
-
- 2.3.3 Syslog
-
-
-
- The kernel and many system programs produce error, warning, and other messages.
- It is often important that these messages can be viewed later, even much later,
- so they should be written to a file. The program doing this is syslog. It can be
- configured to sort the messages to different files according to writer or degree
- of importance.
- 2.3. Major services in a UNIX system 13
-
-
-
- For example, kernel messages are often directed to a separate file from the
- others, since kernel messages are often more important and need to be read
- regularly to spot problems.
-
- 2.3.4 Periodic command execution: cron and at
-
-
-
- Both users and the system administrator often need to run specific commands
- peri-odically. For example, the system administrator might want to run a command
- to clean the directories with temporary files (/tmp and /var/tmp) from old
- files, to keep the disks from filling up, since not all programs clean up after
- themselves correctly.
-
- The cron service is set up to do this. Each user has a crontab, where he
- lists the commands he wants to execute and the times they should be executed.
- The crond daemon takes care of starting the commands when specified.
-
-
- The at service is similar to cron, but it is once only: the command is
- executed at the given time, but it is not repeated.
-
- 2.3.5 Graphical user interface
-
-
-
- UNIX and Linux don't incorporate the user interface into the kernel; instead,
- they let it be implemented by user level programs. This applies for both text
- mode and graphical environments.
-
-
- This arrangement makes the system more flexible, but has the disadvantage
- that it is simple to implement a different user interface for each program,
- making the system harder to learn.
-
-
- The graphical environment primarily used with Linux is called the X Window
- System (X for short). X also does not implement a user interface; it only
- implements a window system, i.e., tools with which a graphical user interface
- can be implemented.
-
- The three most popular user interface styles implemented over X are Athena,
- Motif, and Open Look.
-
- 2.3.6 Networking
-
-
-
- Networking is the act of connecting two or more computers so that the can commu-
- nicate with each other. The actual methods of connecting and communicating are
- slightly complicated, but the end result is very attractive.
- 14 Chapter 2. Overview of a Linux System
-
-
-
- UNIX operating systems have many networking features. Most basic services_
- filesystems, printing, backups, etc_can be done over the network. This can make
- system administration easier, since it allows centralized administration, while
- still reaping in the benefits of microcomputing and distributed computing, such
- as lower costs and better fault tolerance.
-
-
- However, this book merely glances at networking; see the Linux Network
- Admin-istrators' Guide for more information, including a basic descriptions of
- how networks operate.
-
- 2.3.7 Network logins
-
-
-
- Network logins work a little differently than normal logins. There is a separate
- phys-ical serial line for each terminal via which it is possible to log in. For
- each person logging in via the network, there is a separate virtual network
- connection, and there can be any number of these2. It is therefore not possible
- to run a separate getty for each possible virtual connection. There are also
- several different ways to log in via network, telnet and rlogin being the major
- ones in TCP/IP networks.
-
-
- Network logins have, instead of a herd of gettys, a single daemon (per way of
- logging in; telnet and rlogin have separate daemons) that listens for all
- incoming login attempts. When it notices one, it starts a new instance of itself
- to handle that single attempt; the original instance continues to listen for
- other attempts. The new instance works similarly to getty.
-
- 2.3.8 Network file systems
-
-
-
- One of the more useful things that can be done with networking services is
- sharing files via a network file system. The one usually used is called the
- Network File System, or NFS, developed by Sun.
-
-
- With a network file system any file operations done by a program on one
- machine are sent over the network to another computer. This fools the program
- to think that all the files on the other computer are actually on the computer
- the program is running on. This makes information sharing extremely simple,
- since it requires no modifications to programs.
-
- _____________________________2
- Well, at least there can be many. Network bandwidth still being a scarce
- resource, there is still some practical upper limit to the number of concurrent
- logins via one network connection.
- 2.4. The filesystem layout 15
-
-
-
- 2.3.9 Mail
-
-
-
- Electronic mail is usually the most important method for communicating via com-
- puter. An electronic letter is stored in a file using a special format, and
- special mail programs are used to send and read the letters.
-
-
- Each user has an incoming mailbox (a file in the special format), where all
- new mail is stored. When someone sends a mail, the mail program locates the
- receiver's mailbox and appends the letter to the mailbox file. If the
- receiver's mailbox is in an another machine, the letter is sent to the other
- machine, which delivers it to the mailbox as it best sees fit.
-
-
- The mail system consists of many programs. The delivery of mail to local or
- remote mailboxes is done by one program (e.g., sendmail or smail), while the
- programs users use are many and varied (e.g., Pine or elm). The mailboxes are
- usually stored in /var/spool/mail.
-
- 2.3.10 Printing
-
-
-
- Only one person can use a printer at one time, but it is uneconomical not to
- share printers between users. The printer is therefore managed by software that
- implements a print queue: all print jobs are put into a queue and whenever the
- printer is done with one job, the next one is sent to it automatically. This
- relieves the users from organizing the print queue and fighting over control of
- the printer.3
-
-
- The print queue software also spools the printouts on disk, i.e., the text is
- kept in a file while the job is in the queue. This allows an application program
- to spit out the print jobs quickly to the print queue software; the application
- does not have to wait until the job is actually printed to continue. This is
- really convenient, since itallows one to print out one version, and not have to
- wait for it to be printed before one can make a completely revised new version.
-
- 2.4 The filesystem layout
-
-
-
- The filesystem is divided into many parts; usually along the lines of a root
- filesystem with /bin, /lib, /etc, /dev, and a few others; a /usr filesystem with
- programs and unchanging data; a /var filesystem with changing data (such as log
- files); and a /home
- _____________________________3
- Instead, they form a new queue at the printer, waiting for their printouts,
- since no-one ever seems to be able to get the queue software to know exactly
- when anyone's printout is really finished. This is a great boot for intra-office
- social relations.
- 16 Chapter 2. Overview of a Linux System
-
-
-
- filesystem for everyone's personal files. Depending on the hardware
- configuration and the decisions of the system administrator, the division can be
- different; it can even be all in one filesystem.
-
-
- Chapter 5 describes the filesystem layout in some detail; the Linux
- Filesystem Standard covers it in somewhat more detail.
-
-
- Chapter 3
-
-
-
- Boots And Shutdowns
-
-
-
- This chapter needs a quote. Suggestions, anyone?
- This section explains what goes on when a Linux system is turned on and off, and
- how it should be done properly.
-
- 3.1 An overview of boots and shutdowns
-
- The act of turning on a computer system and making its operating system to be
- loaded1 is called booting. The name comes from an image of the computer pulling
- itself up from its bootstraps, but the act itself slightly more realistic.
-
- During bootstrapping the computer first loads a small piece of code called
- the bootstrap loader, which in turn loads and starts the operating system. The
- boot-strap loader is usually stored in a fixed location on a hard disk or a
- floppy. The reason for this two step process is that the operating system is big
- and complicated, but the first piece of code that the computer loads must be
- very small (a few hundred bytes), to avoid making the hardware unnecessarily
- complicated.
-
-
- Different computers do the bootstrapping differently. For PC's, the computer
- (well, it's BIOS) reads in the first sector (called the boot sector) of a floppy
- or hard disk. The bootstrap loader is contained withing this sector. It loads
- the operating system from elsewhere on the disk (or from some other place).
-
- After Linux has been loaded, it initializes the hardware and device drivers,
- and
- _____________________________1
- On early computers, it wasn't enough to merely turn on the computer, you had
- to manually load the operating system as well. These new-fangled thing-a-ma-gigs
- do it all by themselves.
-
-
- 17
- 18 Chapter 3. Boots And Shutdowns
-
-
-
- then runs init(8). init starts other processes to allow users to log in, and do
- things. The details of this part will be discussed below.
-
-
- In order to shut down a Linux system, first all processes are told to
- terminate (this makes them close any files and do other necessary things to keep
- things tidy), then filesystems and swap areas are unmounted, and finally a
- message is printed to the console that the power can be turned off. If the
- proper procedure is not followed, terrible things can and will happen; most
- importantly, the filesystem buffer cache might not be flushed, which means that
- all data in it is lost and the filesystem on disk is inconsistent, and therefore
- possibly unusable.
-
- 3.2 The boot process in closer look
-
- You can boot Linux either from a floppy or from the hard disk. The installation
- section in the Getting Started guide tells you how to install Linux so you can
- boot it the way you want to.
-
-
- When the computer is booted, the BIOS will do various tests to check that
- ev-erything looks all-right,2 and will then start the actual booting. It will
- choose a disk drive (typically the first floppy drive, if there is a floppy
- inserted, otherwise the first hard disk, if one is installed in the computer;
- the order might be configurable, how-ever) and will then read its very first
- sector. This is called the boot sector; for a hard disk, it is also called the
- master boot record, since a hard disk can contain several partitions, each with
- their own boot sectors.
-
- The boot sector contains a small program (small enough to fit into one
- sector) whose responsibility is to read the actual operating system from the
- disk and start it. When booting Linux from a floppy disk, the boot sector
- contains code that just reads the first few hundred blocks (depending on the
- actual kernel size, of course) to a predetermined place in memory. On a Linux
- boot floppy, there is no filesystem, the kernel is just stored in consecutive
- sectors, since this simplifies the boot process. It is possible, however, to
- boot from a floppy with a filesystem, by using LILO.
-
- When booting from the hard disk, the code in the master boot record will
- examine the partition table (also in the master boot record), identify the
- active partition (the partition that is marked to be bootable), read the boot
- sector from that partition, and then start the code in that boot sector. The
- code in the partition's boot sector does what a floppy disk's boot sector does:
- it will read in the kernel from the partition and start it. The details vary,
- however, since it is generally not useful to have a
- _____________________________2
- These is called the power on self test, or POST for short.
- 3.2. The boot process in closer look
- 19
-
-
-
- separate partition for just the kernel image, so the code in the partition's
- boot sector can't just read the disk in sequential order, it has to find the
- sectors whereever thefilesystem has put them. There are several ways around this
- problem, but the most common way is to use LILO. (The details about how to do
- this are irrelevant for this discussion, however; see the LILO documentation for
- more information, it is most thorough.)
-
-
- When booting with LILO, it will normally go right ahead and read in and boot
- the default kernel. It is also possible to configure LILO to be able to boot one
- of several kernels, or even other operating systems than Linux, and it is
- possible for the user to choose which kernel or operating system is to be booted
- at boot time. LILO can be ____ _____ _____
- configured so that if one holds down the |_alt|_, |_shift|_, or |_ctrl|_key at
- boot time (i.e.when LILO is loaded), LILO will ask what is to be booted and not
- boot the default right away. Alternatively, LILO can be configured so that it
- will always ask, with an optional timeout that will cause the default kernel to
- be booted.
-
-
- The are other boot loaders than LILO. However, since LILO has been written
- especially for Linux, it has some features that are useful and that only it
- provides, for example the ability to pass arguments to the kernel at boot time,
- or overriding some configuration options built into the kernel. Hence, it is
- usually the best choice. Among the alternatives are bootlin and bootactv.3
-
-
- Booting from floppy and from hard disk have both their advantages, but
- generally booting from the hard disk is nicer, since it avoids the hassle of
- playing around with floppies. It is also faster. However, it can be more
- troublesome to install the system so it can boot from the hard disk, so many
- people will first boot from floppy, then, when the system is otherwise installed
- and working well, will install LILO and start booting from the hard disk.
-
-
- After the Linux kernel has been read into the memory, by whatever means, and
- is started for real, roughly the following things happen:
-
- o The Linux kernel is installed compressed, so it will first uncompress itself.
- The beginning of the compressed kernel contains a small uncompressed program
- that does this.
-
- o If you have a super-VGA card that Linux recognizes and that has some special
- text modes (such as 100 columns by 40 rows), Linux asks you which mode you want
- to use. During the kernel compilation, it is possible to preset a video mode, so
- that this is never asked. This can also be done with LILO or rdev(8).
- _____________________________3
- I don't know much about any of the alternatives. If and when I learn, I will
- add more descriptions.
- 20 Chapter 3. Boots And Shutdowns
-
-
-
- oAfter this the kernel checks what other hardware there is (hard disks,
- floppies, network adapters: :):, and configures some of its device drivers
- appropriately; while it does this, it outputs messages about its findings. For
- example, when I boot, I it looks like this:
-
- LILO boot:
- Loading linux.
- Console: colour EGA+ 80x25, 8 virtual consoles
- Serial driver version 3.94 with no serial options enabled
- tty00 at 0x03f8 (irq = 4) is a 16450
- tty01 at 0x02f8 (irq = 3) is a 16450
- lp`init: lp1 exists (0), using polling driver
- Memory: 7332k/8192k available (300k kernel code, 384k reserved, 176k data)
- Floppy drive(s): fd0 is 1.44M, fd1 is 1.2M
- Loopback device init
- Warning WD8013 board not found at i/o = 280.
- Math coprocessor using irq13 error reporting.
- Partition check:
- hda: hda1 hda2 hda3
- VFS: Mounted root (ext filesystem).
- Linux version 0.99.pl9-1 (root@haven) 05/01/93 14:12:20
-
-
- The exact texts are different on different systems, depending on the
- hardware, the version of Linux being used, and how it has been configured.
-
- oThen the kernel will try to mount the root filesystem. The place is
- configurable at compilation time, or any time with rdev or LILO. The filesystem
- type is detected automatically. If the mounting of the root filesystem fails,
- for example because you didn't remember to include the corresponding filesystem
- driver in the kernel, the kernel panics and halts the system (there isn't much
- it can do, anyway).
-
- The root filesystem is usually mounted read-only (this can be set in the
- same way as the place). This makes it possible to check the filesystem while it
- is mounted; it is not a good idea to check a filesystem that is mounted
- read-write.
-
-
- oAfter this, the kernel starts the program init(8) (located in /sbin/init) in
- the background (this will always become process number 1). init does various
- startup chores. The exact things it does depends on the version being used; see
- chapter ?? for more information.
-
- oinit then starts a getty(8) for virtual consoles and serial lines. getty is
- the program which lets people log in via virtual consoles and serial terminals.
- init may also start some other programs, depending on how it is configured.
-
- oAfter this, the boot is complete, and the system is up and running normally.
-
- 3.3. More about shutdowns 21
-
-
-
- 3.3 More about shutdowns
-
-
-
- META: two different implemetnations of shutdown? one that uses reboot/halt as
- internal binaries that shouldn't be run by hand?
-
-
- It is important to follow the correct procedures when you shut down a Linux
- system. If you fail do so, your filesystems probably will become trashed and the
- files probably will become scrambled. This is because Linux has a disk cache
- that won't write things to disk at once, but only at intervals. This greatly
- improves performance but also means that if you just turn off the power at a
- whim the cache may hold a lot of data and that what is on the disk may not be a
- fully working filesystem (because only some things have been written to the
- disk).
-
- Another reason against just flipping the power switch is that in a
- multi-tasking system there can be lots of things going on in the background, and
- shutting the power can be quite disastrous. This is especially true for machines
- that several people use at the same time.
-
- The commands for properly shutting down a Linux system are shutdown(8) and
- halt(8) (both are located in /sbin). There are two usual ways of using them.
-
- If you are running a system where you are the only user, the usual way of
- using shutdown is to quit all running programs, log out on all virtual consoles,
- log in as root on one of them (or stay logged in as root if you already are, but
- you should change to the root directory, to avoid problems with unmounting),
- then give the command halt or shutdown -h now (substitute now with a plus sign
- and a number in minutes if you want a delay, though you usually don't on a
- single user system) or halt.
-
- Alternatively, if your system has many users, use the command shutdown -h
- +time message, where time is the time in minutes until the system is halted, and
- message is a short explanation of why the system is shutting down.
-
- root # shutdown -h +10 'We will install a new disk. System should
- > be back on-line in three hours.'
-
- This will warn everybody that the system will shut down in ten minutes, and that
- they'd better get lost or loose data. The warning is printed to every terminal
- on which someone is logged in, including all xterms.
-
- Broadcast message from root (ttyp0) Wed Aug 2 01:03:25 1995...
- We will install a new disk. System should
- 22 Chapter 3. Boots And Shutdowns
-
-
- be back on-line in three hours.
- The system is going DOWN for system halt in 10 minutes !!
-
- The warning is automatically repeated a few times before the boot, with shorter
- and shorter intervals as the time runs out. You can't use a delay with halt; it
- is seldom appropriate to use halt on a multiuser system.
-
- META: /etc/inittab can give commands to execute when halting/rebooting
-
- When the real shutting down starts after any delays, all filesystems (except
- the root one) are unmounted, user processes (if anybody is still logged in) are
- killed, daemons are shut down, all filesystem are unmounted, and generally
- everything settles down.
-
- When that is done, shutdown prints out a message that you can power down the
- machine. Then, and only then, should you move your fingers towards the power
- switch.
-
- Sometimes, although rarely on any good system, it is impossible to shut down
- properly. For instance, if the kernel panics and crashes and burns and
- generally misbehaves, it might be completely impossible to give any new
- commands, henceshutting down properly is somewhat difficult, and just about
- everything you can do is hope that nothing has been too severely damaged and
- turn off the power. If the troubles are a bit less severe (say, somebody merely
- hit your keyboard with an axe), and the kernel and the update program still run
- normally, it is probably a good idea to wait a couple of minutes to give
- update(8) a chance to flush the buffer cache, and only cut the power after that.
-
- Some people like to shut down using the command sync(8)4 three times, waiting
- for the disk I/O to stop, then turn off the power. If there are no running
- programs, this is about equivalent to using shutdown. However, it does not
- unmount any filesystems and this can lead to problems with the ext2fs \clean
- filesystem" flag. The triple-sync method is not recommended.
-
- (In case you're wondering: the reason for three syncs is that in the early
- days of UNIX, when the commands were typed separately, that usually gave
- sufficient time for most disk I/O to be finished.)
-
- 3.4 Rebooting
-
-
-
- Rebooting means booting the system again. This can be accomplished by first
- shut-ting it down completely, turning power off, and then turning it back on. A
- simpler
- _____________________________4
- sync flushes the buffer cache.
- 3.5. Single user mode 23
-
-
-
- way is to ask shutdown to reboot the system, instead of merely halting it. This
- is accomplished by using the -r option to shutdown, for example, by giving the
- com-mand shutdown -r now. You can also use the reboot command (which, like halt,
- doesn't wait until it perpetrates its foul deed).
-
- 3.5 Single user mode
-
- The shutdown command can also be used to bring the system down to single user
- mode, in which no one can log in, but root can use the console. This is useful
- for system administration tasks that can't be done while the system is running
- normally.
-
- Single user mode is discussed more thoroughly in chapter ??.
-
- 3.6 Emergency boot floppies
-
- It is not always possible to boot a computer from the hard disk. For example, if
- you make a mistake in configuring LILO, you might make your system unbootable.
- For these situations, you need an alternative way of booting that will always
- work (as long as the hardware works). For typical PC's, this means booting from
- the floppy drive.
-
-
- Most Linux distributions allow one to create an emergency boot floppy during
- installation. It is a good idea to do this. However, many such boot disks
- contain only the kernel, and assume you will be using the programs on the
- distributions' installation disks to fix whatever problem you have. Sometimes
- those programs aren't enough; for example, you might have to restore some files
- from backups made with software not on the installation disks.
-
-
- Thus, it might be necessary to create a custom root floppy as well. The
- Bootdisk HOWTO by Graham Chapman contains instructions for doing this. You
- must, of course, remember to keep your emergency boot and root floppies up to
- date.
-
-
- You can't use the floppy drive you use to mount the root floppy for anything
- else. This can be inconvenient if you only have one floppy. However, if you have
- enough memory, you can configure your boot floppy to load the root disk to a
- ramdisk (the boot floppy's kernel needs to be specially configured for this).
- This frees the floppy drive after the root floppy has been loaded to a ramdisk.
- 24 Chapter 3. Boots And Shutdowns
-
-
-
-
-
- Chapter 4
-
-
-
- Using Disks and Other Storage
-
-
-
- Media
-
-
-
- On a clear disk you can seek forever.
- When you install or upgrade your system, you need to do a fair amount of work on
- your disks. You have to make filesystems on your disks so that files can be
- stored on them and reserve space for the different parts of your system.
-
-
- This chapter explains all these initial activities. Usually, once you get
- your system set up, you won't have to go through the work again, except for
- using floppies. You'll need to come back to this chapter if you add a new disk
- or want to fine-tune your disk usage.
-
-
- The basic tasks in administering disks are:
-
- o Format your disk. This does various things to prepare it for use, such as
- checking for bad sectors. (Formatting is nowadays not necessary for most hard
- disks.)
-
- o Partition a hard disk, if you want to use it for several activities that
- aren't supposed to interfere with one another. One reason for partitioning is to
- store different operating systems on the same disk. Another reason is to keep
- user files separate from system files, which simplifies back-ups and helps
- protect the system files from corruption.
-
-
- o Make a filesystem (of a suitable type) on each disk or partition. The disk
- means nothing to Linux until you make a filesystem; then files can be created
- and accessed on it.
-
-
- 25
- 26 Chapter 4. Using Disks and Other Storage Media
-
-
-
- oMount different filesystems to form a single tree structure, either
- automatically, or manually as needed. (Manually mounted filesystems usually need
- to be un-mounted manually as well.)
-
- Chapter 6 contains information about virtual memory and disk caching, of
- which you also need to be aware of when using disks.
-
- This chapter explains what you need to know for hard disks and floppies.
- Unfortu-nately, because I lack the equipment, I cannot tell you much about using
- other types of media, such as tapes or CD-ROM's.
-
- 4.1 Two kinds of devices
-
- UNIX, and therefore Linux, recognizes two different kinds of devices:
- random-access block devices (such as disks), and character devices (such as
- tapes and serial lines), some of which may be serial, and some random-access.
- Each supported device is represented in the filesystem as a device file. When
- you read or write a device file, the data comes from or goes to the device it
- represents. This way no special programs (and no special application programming
- methodology, such as catching interrupts or polling a serial port) are necessary
- to access devices; for example, to send a file to the printer, one could just
- say
-
- ttyp5 root " $ cat filename > /dev/lp1
- ttyp5 root " $
-
- and the contents of the file are printed (the file must, of course, be in a form
- that the printer understands). However, since it is not a good idea to have
- several people cat their files to the printer at the same time, one usually uses
- a special program to send the files to be printed (usually lpr(1)). This program
- makes sure that only one file is being printed at a time, and will automatically
- send files to the printer as soon as it finishes with the previous file.
- Something similar is needed for most devices. In fact, one seldom needs to worry
- about device files at all.
-
- Since devices show up as files in the filesystem (in the /dev directory), it
- is easy to see just what device files exist, using ls(1) or another suitable
- command. In the output of ls -l, the first column contains the type of the file
- and its permissions. For example, inspecting a serial device gives on my system
-
- ttyp5 root " $ ls -l /dev/cua0
- crw-rw-rw- 1 root uucp 5, 64 Nov 30 1993 /dev/cua0
- 4.2. Hard disks 27
-
-
-
- ttyp5 root " $
-
-
- The first character in the first column, i.e., `c' in crw-rw-rw- above, tells an
- informed user the type of the file, in this case a character device. For
- ordinary files, the first character is `-', for directories it is `d', and for
- block devices `b'; see the ls(1) man page for further information.
-
- Note that usually all device files exist even though the device itself might
- be not be installed. So just because you have a file /dev/sda, it doesn't mean
- that you really do have an SCSI hard disk. Having all the device files makes the
- installation programs simpler, and makes it easier to add new hardware (there is
- no need to find out the correct parameters for and create the device files for
- the new device).
-
- 4.2 Hard disks
-
-
-
- This subsection introduces terminology related to hard disks. If you already
- know the terms and concepts, you can skip this subsection.
-
- See figure 4.1 for a schematic picture of the important parts in a hard disk.
- A hard disk consists of one or more circular platters,1 of which either or both
- surfaces are coated with a magnetic substance used for recording the data. For
- each surface, there is a read-write head that examines or alters the recorded
- data. The platters rotate on a common axis; a typical rotation speed is 3600
- rotations per minute, although high-performance hard disks have higher speeds.
- The heads move along the radius of the platters; this movement combined with the
- rotation of the platters allows the head to access all parts of the surfaces.
-
- The processor (CPU) and the actual disk communicate through a disk
- controller. This relieves the rest of the computer from knowing how to use the
- drive, since the controllers for different types of disks can be made to use the
- same interface towards the rest of the computer. Therefore, the computer can
- say just \hey disk, gimme what I want", instead of a long and complex series of
- electric signals to move the head to the proper location and waiting for the
- correct position to come under the head and doing all the other unpleasant stuff
- necessary. (In reality, the interface to the controller is still complex, but
- much less so than it would otherwise be.) The controller can also do some other
- stuff, such as caching, or automatic bad sector replacement.
-
- The above is usually what one needs to understand about the hardware. There
- _____________________________1
- The platters are made of a hard substance, e.g., aluminium, which gives the
- hard disk its name.
- 28 Chapter 4. Using Disks and Other Storage Media
-
-
-
- is also a bunch of other stuff, such as the motor that rotates the platters and
- moves the heads, and the electronics that control the operation of the
- mechanical parts, but that is mostly not relevant for understanding the working
- principle of a hard disk.
-
- The surfaces are usually divided into concentric rings, called tracks, and
- these in turn are divided into sectors. This division is used to specify
- locations on the hard disk and to allocate disk space to files. To find a given
- place on the hard disk, one might say \surface 3, track 5, sector 7". Usually
- the number of sectors is the same for all tracks, but some hard disks put more
- sectors in outer tracks (all sectors are of the same physical size, so more of
- them fit in the longer outer tracks). Typically, a sector will hold 512 bytes of
- data. The disk itself can't handle smaller amounts of data than one sector.
-
- Figure 4.1: A schematic picture of a hard disk.
-
-
-
- Each surface is divided into tracks (and sectors) in the same way. This means
- that when the head for one surface is on a track, the heads for the other
- surfaces are also on the corresponding tracks. All the corresponding tracks
- taken together are called a cylinder. It takes time to move the heads from one
- track (cylinder) to another, so by placing the data that is often accessed
- together (say, a file) so that it is within one cylinder, it is not necessary to
- move the heads to read all of it. This improves
- 4.2. Hard disks 29
-
-
-
- performance. It is not always possible to place files like this; files that are
- stored in several places on the disk are called fragmented.
-
- The number of surfaces (or heads, which is the same thing), cylinders, and
- sectors vary a lot; the specification of the number of each is called the
- geometry of a hard disk. The geometry is usually stored in a special,
- battery-powered memory location called the CMOS RAM, from where the operating
- system can fetch it during bootup or driver initialization.
-
-
- Unfortunately, the BIOS2 has a design limitation, which makes it impossible
- to specify a track number that is larger than 1024 in the CMOS RAM, which is too
- little for a large hard disk. To overcome this, the hard disk controller lies
- about the geometry, and translates the addresses given by the computer into
- something that fits reality. For example, a hard disk might have 8 heads, 2048
- tracks, and 35 sectors per track3. Its controller could lie to the computer and
- claim that it has 16 heads, 1024 tracks, and 35 sectors per track, thus not
- exceeding the limit on tracks, and translates the address that the computer
- gives it by halving the head number, and doubling the track number. The math can
- be more complicated in reality, because the numbers are not as nice as here (but
- again, the details are not relevant for understanding the principle). This
- translation distorts the operating system's view of how the disk is organized,
- thus making it impractical to use the all-data-on-one-cylinder trick to boost
- performance.
-
- The translation is only a problem for IDE disks. SCSI disks use a sequential
- sector number (i.e., the controller translates a sequential sector number to
- head/cylinder/sector), and a completely different method for the CPU to talk
- with the controller, so they are insulated from the problem. Note, however, that
- the computer might not know the real geometry of an SCSI disk either.
-
- Since Linux often will not know the real geometry of a disk, its filesystems
- don't even try to keep files within a single cylinder. Instead, it tries to
- assign sequentially numbered sectors to files, which almost always gives similar
- performance. The issue is further complicated by on-controller caches, and
- automatic prefetches done by the controller.
-
-
- Each hard disk is represented by a separate device file. There can (usually)
- be only two IDE hard disks. These are known as /dev/hda and /dev/hdb,
- respectively. SCSI hard disks are known as /dev/sda, /dev/sdb, and so on.
- Similar naming conventionsexist for other hard disk types Note that the device
- files for the hard disks give access
- _____________________________2
- The BIOS is some built-in software stored on ROM chips. It takes care, among
- other things, of the initial stages
- of booting.
- 3The numbers are completely imaginary.
- 30 Chapter 4. Using Disks and Other Storage Media
-
-
-
- to the entire disk, with no regard to partitions (which will be discussed
- below), and it's easy to mess up the partitions or the data in them if you
- aren't careful. The disks' device files are usually used only to get access to
- the master boot record (which will also be discussed below).
-
-
- 4.3 Floppies
- A floppy disk consists of a flexible membrane covered on one or both sides with
- similar magnetic substance as a hard disk. The floppy disk itself doesn't have a
- read-write head, that is included in the drive. A floppy corresponds to one
- platter in a hard disk, but is removable and one drive can be used to access
- different floppies, whereas the hard disk is one indivisible unit.
-
- Like a hard disk, a floppy is divided into tracks and sectors (and the two
- corre-sponding tracks on either side of a floppy form a cylinder), but there are
- many fewer of them than on a hard disk.
-
- A floppy drive can usually use several different types of disks; for example,
- a 31_2 inch drive can use both 720 kB and 1.44 MB disks. Since the drive has to
- operate a bit differently and the operating system must know how big the disk
- is, there are many device files for floppy drives, one per combination of drive
- and disk type. Therefore, /dev/fd0H1440 is the first floppy drive (fd0), which
- must be a 31_2inch drive, using a 31_2inch, high density disk (H) of size 1440
- kB (1440), i.e., a normal 31_2inch HD floppy. For more information on the naming
- conventions for the floppy devices.
-
- The names for floppy drives are complex, however, and Linux therefore has a
- special floppy device type that automatically detects the type of the disk in
- the drive. It works by trying to read the first sector of a newly inserted
- floppy using different floppy types until it finds the correct one. This
- naturally requires that the floppy is formatted first. The automatic devices are
- called /dev/fd0, /dev/fd1, and so on.
-
- The parameters the automatic device uses to access a disk can also be set
- using the program setfdprm(8). This can be useful if you need to use disks that
- do not follow any usual floppy sizes, e.g., if they have an unusual number of
- sectors, or if the autodetecting for some reason fails and the proper device
- file is missing.
-
- Linux can handle many nonstandard floppy disk formats in addition to all the
- standard ones. Some of these require using special formatting programs. We'll
- skip these disk types for now.
- 4.4. Formatting 31
-
-
-
- 4.4 Formatting
-
-
- Formatting is the process of writing marks on the magnetic media that are used
- to mark tracks and sectors. Before a disk is formatted, its magnetic surface is
- a complete mess of magnetic signals. When it is formatted, some order is brought
- into the chaos by essentially drawing lines where the tracks go, and where they
- are divided into sectors. The actual details are not quite exactly like this,
- but that is irrelevant. What is important, is that a disk cannot be used unless
- it has been formatted.
-
- The terminology is a bit confusing here: in MS-DOS, the word formatting is
- used to cover also the process of creating a filesystem (which will be discussed
- below). There, the two processes are often combined, especially for floppies.
- When the distinction needs to be made, the real formatting is called low-level
- formatting, while making the filesystem is called high-level formatting. In UNIX
- circles, the two are called formatting and making a filesystem, so that's what
- is used in this book as well.
-
- For IDE and some SCSI disks the formatting is actually done at the factory
- and doesn't need to be repeated; hence most people rarely need to worry about
- it. In fact, formatting a hard disk can cause it to work less well, for example
- because a disk might need to be formatted in some very special way to allow
- automatic bad sector replacement to work.
-
- Disks that need or can be formatted, often require a special program anyway,
- because the interface to the formatting logic inside the drive is different from
- drive to drive. The formatting program is often either on the controller BIOS,
- or is supplied as an MS-DOS program; neither of these can easily be used from
- within Linux.
-
- During formatting one might encounter bad spots on the disk, called bad
- blocks or bad sectors. These are sometimes handled by the drive itself, but even
- then, if more of them develop, something needs to be done to avoid using those
- parts of the disk. The logic to do this is built into the filesystem; how to add
- the information into the filesystem is described below. Alternatively, one might
- create a small partitionthat covers just the bad part of the disk; this approach
- might be a good idea if the bad spot is very large, since filesystems can
- sometimes have trouble with very large bad areas.
-
-
- Floppies are formatted with fdformat(8). The floppy device file to use is
- given as the parameter. For example, the following command would format a high
- density, 31_2inch floppy in the first floppy drive:
-
- ttyp5 root " $ fdformat /dev/fd0H1440
- Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB.
- 32 Chapter 4. Using Disks and Other Storage Media
-
-
-
- Formatting ... done
- Verifying ... done
- ttyp5 root " $
-
-
- Note that if you want to use an autodetecting device (e.g., /dev/fd0), you must
- set the parameters of the device with setfdprm(8) first. To achieve the same
- effect as above, one would have to do the following:
-
- ttyp5 root " $ setfdprm /dev/fd0 1440/1440
- ttyp5 root " $ fdformat /dev/fd0
- Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB.
- Formatting ... done
- Verifying ... done
- ttyp5 root " $
-
- It is usually more convenient to choose the correct device file that matches the
- type of the floppy. Note that it is unwise to format floppies to contain more
- information than what they are designed for.
-
-
- fdformat will also validate the floppy, i.e., check it for bad blocks. It
- will try a bad block several times (you can usually hear this, the drive noise
- changes dramatically). If the floppy is only marginally bad (due to dirt on the
- read/write head, some errors are false signals), fdformat won't complain, but a
- real error will abort the validation process. The kernel will print log messages
- for each I/O error it finds; these will go to the console or, if syslog is being
- used, to the file /usr/adm/messages. fdformat itself won't tell where the error
- is (one usually doesn't care, floppies are cheap enough that a bad one is
- automatically thrown away).
-
- ttyp5 root " $ fdformat /dev/fd0H1440
- Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB.
- Formatting ... done
- Verifying ... read: Unknown error
- ttyp5 root " $
-
-
- The badblocks(8) command can be used to search any disk or partition for bad
- blocks (including a floppy). It does not format the disk, so it can be used to
- check even existing filesystems. The example below checks a 31_2inch floppy with
- two bad blocks.
-
- ttyp5 root " $ badblocks /dev/fd0H1440 1440
-
- 718
- 4.5. Partitions 33
-
-
-
- 719
-
- ttyp5 root " $
-
-
- badblocks outputs the block numbers of the bad blocks it finds. Most filesystems
- can avoid such bad blocks. They maintain a list of known bad blocks, which is
- initialized when the filesystem is made, and can be modified later. The initial
- search for bad blocks can be done by the mkfs command (which initializes the
- filesystem), but later checks should be done with badblocks and the new blocks
- should be added with fsck. We'll describe mkfs and fsck later.
-
- 4.5 Partitions
-
- A hard disk can be divided into several partitions. Each partition functions as
- if it were a separate hard disk. The idea is that if you have one hard disk, and
- want to have, say, two operating systems on it, you can divide the disk into two
- partitions. Each operating system uses its partition as it wishes and doesn't
- touch the other one's. This way the two operating systems can co-exist
- peacefully on the same hard disk. Without partitions one would have to buy a
- hard disk for each operating system.
-
- Floppies are not partitioned. There is no technical reason against this, but
- since they're so small, partitions would be useful only very rarely.
-
- 4.5.1 The MBR, boot sectors and partition table
-
- The information about how a hard disk has been partitioned is stored in its
- first sector (that is, the first sector of the first track on the first disk
- surface). The first sector is the master boot record (MBR) of the disk; this is
- the sector that the BIOS reads in and starts when the machine is first booted.
- The master boot record contains a small program that reads the partition table,
- checks which partition is active (that is, marked bootable), and reads the first
- sector of that partition, the partition's boot sector (the MBR is also a boot
- sector, but it has a special status and therefore a special name). This boot
- sector contains another small program that reads the first part of the operating
- system stored on that partition (assuming it is bootable), and then starts it.
-
- The partitioning scheme is not built into the hardware, or even into the
- BIOS. It is only a convention that many operating systems follow. Not all
- operating systems do follow it, but they are the exceptions. Some operating
- systems support partitions, but they occupy one partition on the hard disk, and
- use their internal partitioning method
- 34 Chapter 4. Using Disks and Other Storage Media
-
-
-
- within that partition. The latter type exists peacefully with other operating
- systems (including Linux), and does not require any special measures, but an
- operating system that doesn't support partitions cannot co-exist on the same
- disk with any other operating system.
-
-
- As a safety precaution, it is a good idea to write down the partition table
- on a piece of paper, so that if it ever corrupts you don't have to lose all your
- files. (A bad partition table can be fixed with fdisk).
-
- 4.5.2 Extended and logical partitions
-
-
- The original partitioning scheme for PC hard disks allowed only four partitions.
- This quickly turned out to be too little in real life, partly because some
- people want more than four operating systems (Linux, MS-DOS, OS/2, Minix,
- FreeBSD, NetBSD, or Windows/NT, to name a few), but primarily because sometimes
- it is a good idea to have several partitions for one operating system. For
- example, swap space is usually best put in its own partition for Linux instead
- of in the main Linux partition for reasons of speed (see below).
-
- To overcome this design problem, extended partitions were invented. This
- trick allows partitioning a primary partition into sub-partitions. The primary
- partition thus subdivided is the extended partition; the subpartitions are
- logical partitions. They behave like primary4 partitions, but are created
- differently.
-
- The partition structure of a hard disk might look like that in figure 4.2.
- The disk is divided into three primary partitions, the second of which is
- divided into two logical partitions. Part of the disk is not partitioned at all.
- The disk as a whole and each primary partition has a boot sector.
-
- 4.5.3 Partition types
-
- The partition tables (the one in the MBR, and the ones for extended partitions)
- contain one byte per partition that identifies the type of that partition. This
- attempts to identify the operating system that uses the partition, or what it
- uses it for. The purpose is to make it possible to avoid having two operating
- systems accidentally using the same partition. However, in reality, operating
- systems do not really care about the partition type byte; e.g., Linux doesn't
- care at all what it is. Worse, some of them use it incorrectly; e.g., at least
- some versions of DR-DOS ignore the most significant bit of the byte, while
- others don't.
- _____________________________4
- Illogical?
- 4.5. Partitions 35
-
-
-
-
- Figure 4.2: A sample hard disk partitioning.
-
- There is no standardization agency to specify what each byte value means, but
- some commonly accepted ones are included in the table in table 4.1. The same
- list is available in the Linux fdisk(8) program.
-
-
-
- Table 4.1: Partition types (from the Linux fdisk(8) program).
-
- _________________________________________________________
- 0 Empty 40 Venix 80286 94 Amoeba BBT
- 1 DOS 12-bit FAT 51 Novell? a5 BSD/386
- 2 XENIX root 52 Microport b7 BSDI fs
- 3 XENIX usr 63 GNU HURD b8 BSDI swap
- 4 DOS 16-bit <32M 64 Novell c7 Syrinx
- 5 Extended 75 PC/IX db CP/M
- 6 DOS 16-bit 32M 80 Old MINIX e1 DOS access
- 7 OS/2 HPFS 81 Linux/MINIX e3 DOS R/O
- 8 AIX 82 Linux swap f2 DOS secondary
- 9 AIX bootable 83 Linux native ff BBT
- _a__OS/2_Boot_Manag__93__Amoeba__________________________
-
-
-
- 4.5.4 Partitioning a hard disk
-
-
- There are many programs for creating and removing partitions. Most operating
- systems have their own, and it can be a good idea to use each operating system's
- 36 Chapter 4. Using Disks and Other Storage Media
-
-
-
- own, just in case it does something unusual that the others can't. Many of the
- programs are called fdisk, including the Linux one, or variations thereof.
- Details on using the Linux fdisk are given on its man page. The cfdisk command
- is similar to fdisk, but has a nicer (full screen) user interface.
-
- When using IDE disks, the boot partition (the partition with the bootable
- kernel image files) must be completely within the first 1024 cylinders. This is
- because the disk is used via the BIOS during boot (before the system goes into
- protected mode), and BIOS can't handle more than 1024 cylinders. It is sometimes
- possible to use a boot partition that is only partly within the first 1024
- cylinders. This works as long as all the files that are read with the BIOS are
- within the first 1024 cylinders. Since this is difficult to arrange, it is a
- very bad idea to do it; you never know when a kernel update or disk
- defragmentation will result in an unbootable system. Therefore, make sure your
- boot partition is completely within the first 1024 cylinders.
-
-
- Some newer versions of the BIOS and IDE disks can, in fact, handle disks with
- more than 1024 cylinders. If you have such a system, you can forget about the
- problem; if you aren't quite sure of it, put it within the first 1024 cylinders.
-
- Each partition should have an even number of sectors, since the Linux
- filesystems use a 1 kB block size, i.e., two sectors. An odd number of sectors
- will result in the last sector being unused. This won't result in any problems,
- but it is ugly, and some versions of fdisk will warn about it.
-
- Changing a partition's size usually requires first backing up everything you
- want to save from that partition (preferably the whole disk, just in case),
- deleting the partition, creating new partition, then restoring everything to the
- new partition. There is a program for MS-DOS, called fips, which does this
- without requiring the backup and restore, but for other filesystems it is still
- necessary.
-
- 4.5.5 Device files and partitions
-
- Each partition and extended partition has its own device file. The naming
- convention for these files is that a partition's number is appended after the
- name of the whole disk, with the convention that 1-4 are primary partitions
- (regardless of how many primary partitions there are) and 5-8 are logical
- partitions (regardless of within which primary partition they reside). For
- example, /dev/hda1 is the first primary partition on the first IDE hard disk,
- and /dev/sdb7 is the third extended partition on the second SCSI hard disk.
- 4.6. Filesystems 37
-
-
- 4.6 Filesystems
-
- 4.6.1 What are filesystems?
-
- A filesystem is the methods and data structures that an operating uses to keep
- track of files on a disk or partition that is, the way the files are organized
- on the disk. The word is also used to refer to a partition or disk that is used
- to store the files or the type of the filesystem. Thus, one might say \I have
- two filesystems" meaning one has two partitions on which one stores files, or
- that one is using the \extended filesystem", meaning the type of the filesystem.
-
- The difference between a disk or partition and the filesystem it contains is
- impor-tant. A few programs_including, reasonably enough, programs that create
- filesystems_operate directly on the raw sectors of a disk or partition; if there
- is an existing file system there it will be destroyed or seriously corrupted.
- Most programs operate on a filesystem, and therefore won't work on a partition
- that doesn't contain one (or that contains one of the wrong type).
-
- Before a partition or disk can be used as a filesystem, it needs to be
- initialized, and the bookkeeping data structures need to be written to the disk.
- This process is called making a filesystem.
-
- Most UNIX filesystem types have a similar general structure, although the
- exact details vary quite a bit. The central concepts are superblock, inode, data
- block, directory block, and indirection block. The superblock contains
- information about the filesystem as a whole, such as its size (the exact
- information here depends on the filesystem). An inode contains all information
- about a file, excepts its name. The name is stored in the directory, together
- with the number of the inode. A directory entry consists of a filename and the
- number of the inode which represents the file. The inode contains the numbers of
- several data blocks, which are used to store the data in the file. There is
- space only for a few data block numbers in the inode, however, and if more are
- needed, more space for pointers to the data blocks is allocated dynamically.
- These dynamically allocated blocks are indirect blocks; the name indicates that
- in order to find the data block, one has to find its number in the indirect
- block first.
-
- UNIX filesystems usually allow one to create a hole in a file (this is done
- with lseek(2); check the manual page), which means that the filesystem just
- pretends that at a particular place in the file there is just zero bytes, but no
- actual disk sectors are reserved for that place in the file (this means that the
- file will use a bit less disk space). This happens especially often for small
- binaries, Linux shared libraries, some
- 38 Chapter 4. Using Disks and Other Storage Media
-
-
- databases, and a few other special cases. (Holes are implemented by storing a
- special value as the address of the data block in the indirect block or inode.
- This special address means that no data block is allocated for that part of the
- file, ergo, there is a hole in the file.)
-
- Holes are moderately useful. On the author's system, a simple measurement
- showed a potential for about 4 MB of savings through holes of about 200 MB to-
- tal used disk space. That system, however, contains relatively few programs and
- no database files. The measurement tool is described in appendix B.
-
- 4.6.2 Filesystems galore
-
- Linux supports several types of filesystems. As of this writing the most
- important ones are:
-
- minix The oldest, presumed to be the most reliable, but quite lim-
- ited in features (some time stamps are missing, at most 30
- character filenames) and restricted in capabilities (at most
- 64 MB per filesystem).
-
- xia A modified version of the minix filesystem that lifts the limits
- on the filenames and filesystem sizes, but does not otherwise
- introduce new features. It is not very popular, but is reported
- to work very well.
-
- ext2 The most featureful of the native Linux filesystems, currently
- also the most popular one. It is designed to be easily upwards
- compatible, so that new versions of the filesystem code do not
- require re-making the existing filesystems.
-
- ext An older version of ext2 that wasn't upwards compatible. It
- is hardly ever used in new installations any more, and most
- people have converted to ext2.
-
- In addition, support for several foreign filesystem exists, to make it easier to
- exchange files with other operating systems. These foreign filesystems work
- just like native ones, except that they may be lacking in some usual UNIX
- features, or have curiouslimitations, or other oddities.
-
- msdos Compatibility with MS-DOS (and OS/2 and Windows NT)
- FAT filesystems.
- 4.6. Filesystems 39
-
-
- umsdos Extends the msdos filesystem driver under Linux so that Linux
- can see long filenames, owners, permissions, links, and device
- files. This allows a normal msdos filesystem to be used as if
- it were a Linux one, thus removing the need for a separate
- partition for Linux.
-
-
- iso9660 The standard CD-ROM filesystem; the popular Rock Ridge
- extension to the CD-ROM standard that allow longer file
- names is supported automatically.
-
- nfs A networked filesystem that allows sharing a filesystem be-
- tween many computers to allow easy access to the files from
- all of them.
-
- hpfs The OS/2 filesystem.
-
- sysv SystemV/386, Coherent, and Xenix filesystems.
-
-
- META: ifs, userfs The choice of filesystem to use depends on the situation. If
- compatibility or other reasons make one of the non-native filesystems necessary,
- then that one must be used. If one can choose freely, then it is probably wisest
- to use ext2, since it has all the features but does not suffer from lack of
- performance.
-
- There is also the proc filesystem, usually accessible as the /proc directory,
- which is not really a filesystem at all, even though it looks like one. The proc
- filesystem makes it easy to access certain kernel data structures, such as the
- process list (hence the name). It makes these data structures look like a
- filesystem, and that filesystem can be manipulated with all the usual file
- tools. For example, to get a listing of all processes one might use the command
-
- ttyp5 root " $ ls -l /proc
- total 0
- dr-xr-xr-x 4 root root 0 Jan 31 20:37 1
- dr-xr-xr-x 4 liw users 0 Jan 31 20:37 63
- dr-xr-xr-x 4 liw users 0 Jan 31 20:37 94
- dr-xr-xr-x 4 liw users 0 Jan 31 20:37 95
- dr-xr-xr-x 4 root users 0 Jan 31 20:37 98
- dr-xr-xr-x 4 liw users 0 Jan 31 20:37 99
- -r--r--r-- 1 root root 0 Jan 31 20:37 devices
- -r--r--r-- 1 root root 0 Jan 31 20:37 dma
- -r--r--r-- 1 root root 0 Jan 31 20:37 filesystems
- 40 Chapter 4. Using Disks and Other Storage Media
-
-
- -r--r--r-- 1 root root 0 Jan 31 20:37 interrupts
- -r-------- 1 root root 8654848 Jan 31 20:37 kcore
- -r--r--r-- 1 root root 0 Jan 31 11:50 kmsg
- -r--r--r-- 1 root root 0 Jan 31 20:37 ksyms
- -r--r--r-- 1 root root 0 Jan 31 11:51 loadavg
- -r--r--r-- 1 root root 0 Jan 31 20:37 meminfo
- -r--r--r-- 1 root root 0 Jan 31 20:37 modules
- dr-xr-xr-x 2 root root 0 Jan 31 20:37 net
- dr-xr-xr-x 4 root root 0 Jan 31 20:37 self
- -r--r--r-- 1 root root 0 Jan 31 20:37 stat
- -r--r--r-- 1 root root 0 Jan 31 20:37 uptime
- -r--r--r-- 1 root root 0 Jan 31 20:37 version
- ttyp5 root " $
-
-
-
- (There will be a few extra files that don't correspond to processes, though. The
- above example has been shortened.)
-
- Note that even though it is called a filesystem, no part of the proc
- filesystem touches any disk. It exists only in the kernel's imagination.
- Whenever anyone tries to look at any part of the proc filesystem, the kernel
- makes it look as if the part existed somewhere, even though it doesn't. So, even
- though there is a multi-megabyte /proc/kmem file, it doesn't take any disk
- space.
-
- 4.6.3 Which filesystem should be used?
-
- There is usually little point in using many different filesystems. Currently,
- ext2fs is the most popular one, and it is probably the wisest choice. Depending
- on the overhead for bookkeeping structures, speed, (perceived) reliability,
- compatibility, and various other reasons, it may be advisable to use another
- file system. This needs to be decided on a case-by-case basis.
-
- 4.6.4 Creating a filesystem
-
- Filesystems are created, i.e., initialized, with the mkfs(8) command. There is
- actually a separate program for each filesystem type. mkfs is just a front end
- that runs the appropriate program depending on the desired filesystem type. The
- type is selected with the -t fstype option.
-
- The programs called by mkfs have slightly different command line interfaces.
- The
- 4.6. Filesystems 41
-
- common and most important options are summarized below; see the manual pages for
- more.
-
- -t fstypeSelect the type of the filesystem.
- -c Search bad bad blocks and initialize the bad block list accordingly.
- -l filenameRead the initial bad block list from the file filename.
-
- To create an ext2 filesystem on a floppy, one would give the following commands:
-
- ttyp6 root " $ fdformat -n /dev/fd0H1440
- Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB.
- Formatting ... done
- ttyp6 root " $ badblocks /dev/fd0H1440 1440 > bad-blocks
- ttyp6 root " $ mkfs -t ext2 -l bad-blocks /dev/fd0H1440
- mke2fs 0.5a, 5-Apr-94 for EXT2 FS 0.5, 94/03/10
- 360 inodes, 1440 blocks
- 72 blocks (5.00%) reserved for the super user
- First data block=1
- Block size=1024 (log=0)
- Fragment size=1024 (log=0)
- 1 block group
- 8192 blocks per group, 8192 fragments per group
- 360 inodes per group
-
-
- Writing inode tables: done
- Writing superblocks and filesystem accounting information: done
- ttyp6 root " $
-
-
-
- First, the floppy was formatted (the -n option prevents validation, i.e., bad
- block checking). Then bad blocks were searched with badblocks, with the output
- redirected to a file, bad-blocks. Finally, the filesystem was created, with the
- bad block list initialized by whatever badblocks found.
-
- The -c option could have been used with mkfs instead of badblocks and a
- separate file. The example below does that.
-
- ttyp6 root " $ mkfs -t ext2 -c /dev/fd0H1440
- mke2fs 0.5a, 5-Apr-94 for EXT2 FS 0.5, 94/03/10
- 360 inodes, 1440 blocks
- 42 Chapter 4. Using Disks and Other Storage Media
-
-
- 72 blocks (5.00%) reserved for the super user
- First data block=1
- Block size=1024 (log=0)
- Fragment size=1024 (log=0)
- 1 block group
- 8192 blocks per group, 8192 fragments per group
- 360 inodes per group
-
-
- Checking for bad blocks (read-only test): done
- Writing inode tables: done
- Writing superblocks and filesystem accounting information: done
- ttyp6 root " $
-
-
-
- The -c is more convenient than a separate use of badblocks, but badblocks is
- necessary for checking after the filesystem has been created.
-
- The process to prepare to filesystems on hard disks or partitions is the same
- as for floppies, except that the formatting isn't needed.
-
- 4.6.5 Mounting and unmounting
-
- Before one can use a filesystem, it has to be mounted. The operating system then
- does various bookkeeping things to make sure that everything works. Since all
- files in UNIX are in a single directory tree, the mount operation will make it
- look like the contents of the new filesystem are the contents of an existing
- subdirectory in some already mounted filesystem.
-
-
- For example, figure 4.3 shows three separate filesystems, each with their own
- root directory. When the last two filesystems are mounted below /home and /usr,
- respec-tively, on the first filesystem, we can get a single directory tree, as
- in figure 4.4.
-
-
-
- / / /
- ---------------|--------------- ----|------ -----|------
- | | | | | | | | | | | |
- | | | | | | | | | | | |
- | | | | | | | | | | | |
- | | | | | | | | | | | |
- cin dev home etc lib usr abc liw ftp bin etc lib
- Figure 4.3: Three separate filesystems.
-
-
-
- The mounts could be done as in the following example:
- 4.6. Filesystems 43
-
-
-
- /
- ---------------|---------------
- | | | | | |
- | | | | | |
- | | | | | |
- | | | | | |
- bin dev home etc lib usr
- | |
- ----|------ -----|------
- | | | | | |
- | | | | | |
- | | | | | |
- | | | | | |
- abc liw ftp bin etc lib
- Figure 4.4: /home and /usr have been mounted.
-
- ttyp6 root " $ mount /dev/hda2 /home
- ttyp6 root " $ mount /dev/hda3 /usr
- ttyp6 root " $
-
- The mount(8) command takes two arguments. The first one is the device file
- cor-responding to the disk or partition containing the filesystem. The second
- one is the directory below which it will be mounted. After these commands the
- contents of the two filesystems look just like the contents of the /home and
- /usr directories, respec-tively. One would then say that \/dev/hda2 is mounted
- on /home", and similarly for /usr. To look at either filesystem, one would look
- at the contents of the directory on which it has been mounted, just as it were
- any other directory. Note the difference between the device file, /dev/hda2, and
- the mounted-on directory, /home. The device file gives access to the raw
- contents of the disk, the mounted-on directory gives access to the files on the
- disk. The mounted-on directory is called the mount point.
-
- The mounted-on directory need not be empty, although it must exist. Any files
- in it, however, will be inaccessible by name while the filesystem is mounted.
- (Any files that have already been opened will still be accessible. Files that
- have hard links from other directories can be accessed using those names.) There
- is no harm done with this, and it can even be useful. For instance, some people
- like to have /tmp and /usr/tmp synonymous, and make /tmp be a symbolic link to
- /usr/tmp. When the system is booted, before the /usr filesystem is mounted, a
- /usr/tmp directory residing on the root filesystem is used instead. When /usr is
- mounted, it will make the /usr/tmp directory on the root filesystem
- inaccessible. If /usr/tmp didn't exist on the root filesystem, it would be
- impossible to use temporary files before mounting /usr.
-
- If you don't intend to write anything to the filesystem, use the -r switch
- for mount
- 44 Chapter 4. Using Disks and Other Storage Media
-
-
-
- to do a readonly mount. This will make the kernel stop any attempts at writing
- to the filesystem, and will also stop the kernel from updating file access times
- in the inodes. Read-only mounts are necessary for unwritable media, e.g.,
- CD-ROM's.
-
- The alert reader has already noticed a slight logistical problem. How is the
- first filesystem (called the root filesystem, because it contains the root
- directory) mounted, since it obviously can't be mounted on another filesystem?
- Well, the answer is that it is done by magic.5 The root filesystem is magically
- mounted at boot time, and one can rely on it to always be mounted_if the root
- filesystem can't be mounted, the system does not boot. The name of the
- filesystem that is magically mounted as root is either compiled into the kernel,
- or set using LILO or rdev.
-
-
- The root filesystem is usually first mounted readonly. The startup scripts
- will then run fsck(8) to verify its validity, and if there are no problems, they
- will re-mount it so that writes will also be allowed. fsck must not be run on a
- mounted filesystem, since any changes to the filesystem while fsck is running
- will cause trouble. Since the root filesystem is mounted readonly while it is
- being checked, fsck can fix any problems without worry, since the remount
- operation will flush any metadata that the filesystem keeps in memory.
-
- On many systems there are other filesystems that should also be mounted auto-
- matically at boot time. These are specified in the /etc/fstab file; see the
- fstab(5) man page for details on the format. The details of exactly when the
- extra filesystems are mounted depend on many factors, and can be configured by
- each administrator if need be. When the chapter on booting is finished, you may
- read all about it there.
-
- When a filesystem no longer needs to be mounted, it can be unmounted with
- umount(8)6. umount takes one argument: either the device file or the mount
- point. For example, to unmount the directories of the previous example, one
- could use the commands
-
- ttyp6 root " $ umount /dev/hda2
- ttyp6 root " $ umount /usr
- ttyp6 root " $
-
- See the man page for further instructions on how to use the command. It is
- imperative that you always unmount a mounted floppy. Don't just pop the floppy
- out of the drive! Because of disk caching, the data is not necessarily written
- to the floppy until you unmount it, so removing the floppy from the drive too
- early might cause the contents
- _____________________________
- 5 For more information, see the kernel source or the Kernel Hackers' Guide.
- 6 It should of course be unmount(8), but the n mysteriously disappeared in
- the 70's, and hasn't been seen since. Please return it to Bell Labs, NJ, if you
- find it.
- 4.6. Filesystems 45
-
-
-
- to become garbled. If you just read from the floppy, this is not very likely,
- but if you write, even accidentally, the result may be catastrophic.
-
- Mounting and unmounting requires super user priviledges, i.e., only root can
- do it. The reason for this is that if any user can mount a floppy on any
- directory, then it is rather easy to create a floppy with, say, a Trojan horse
- disguised as /bin/sh, or any other often used program. However, it is often
- necessary to allow users to use floppies, and there are several ways to do this:
-
-
- o Give the users the root password. This is obviously bad security, but is
- the easiest solution. It works well if there is no need for security anyway,
- which is the case on many non-networked, personal systems.
-
- o Use a program such as sudo(8) to allow users to use mount. This is still bad
- security, but doesn't directly give super user priviledges to everyone.7
-
- o Make the users use mtools, a package for manipulating MS-DOS filesystems,
- without mounting them. This works well if MS-DOS floppies are the only thing
- that is needed, but is rather awkward otherwise.
-
- o List the floppy devices and their allowable mount points together with the
- suit-able options in /etc/fstab.
-
- The last alternative can be implemented by adding a line like the following to
- /etc/fstab:
-
- /dev/fd0 /floppy msdos user,noauto
-
- The columns are: device file to mount, directory to mount on, filesystem type,
- and options. The noauto option stops this mount to be done automatically when
- the system is started (i.e., it stops mount -a from mounting it). The user
- option allows any user to mount the filesystem, and, because of security
- reasons, disallows execution of programs (normal or setuid) and interpretation
- of device files from the mounted filesystem. After this, any user can mount a
- floppy with an msdos filesystem with the following command:
-
- ttyp6 root " $ mount /floppy
- ttyp6 root " $
-
- The floppy can (and needs to, of course) be unmounted with the corresponding
- umount command.
-
-
- META: What to do if several types of floppies are needed?
- _____________________________
- 7 It requires several seconds of hard thinking on the users' behalf.
- 46 Chapter 4. Using Disks and Other Storage Media
-
-
-
- 4.6.6 Keeping filesystems healthy
-
- Filesystems are complex creatures, and as such, they tend to be somewhat
- error-prone. A filesystem's correctness and validity can be checked using the
- fsck(8) command. It can be instructed to repair any minor problems it finds, and
- to alert the user if there any unrepairable problems. Fortunately, the code to
- implement filesystems is debugged quite effectively, so there are seldom any
- problems at all, and they are usually caused by power failures, failing
- hardware, or operator errors; for example, by not shutting down the system
- properly.
-
- Most systems are setup to run fsck automatically at boot time, so that any
- errors are detected (and hopefully corrected) before the system is used. Use of
- a corrupted filesystem tends to make things worse: if the data structures are
- messed up, using the filesystem will probably mess them up even more, resulting
- in more data loss. However, fsck can take a while to run on big filesystems, and
- since errors almost never occur if the system has been shut down properly, a
- couple of tricks are used to avoid doing the checks in such cases. The first is
- that if the file /etc/fastboot exists, no checks are made. The second is that
- the ext2 filesystem has a special marker in its superblock that tells whether
- the filesystem was unmounted properly after the previous mount. This allows
- e2fsck (the version of fsck for the ext2 filesystem) to avoid checking the
- filesystem if the flag indicates that the unmount was done (the assumption being
- that a proper unmount indicates no problems). Whether the /etc/fastboot trick
- works on your system depends on your startup scripts, but the ext2 trick works
- every time you use e2fsck_it has to be explicitly bypassed with an option to
- e2fsck to be avoided. (See the e2fsck(8) man page for details on how.)
-
- The automatic checking only works for the filesystems that are mounted
- automat-ically at boot time. Use fsck manually to check other filesystems, e.g.,
- floppies.
-
- If fsck finds unrepairable problems, you need either in-depth knowlege of how
- filesystems work in general, and the type of the corrupt filesystem in
- articular, or good backups. The latter is easy (although sometimes tedious) to
- arrange, the former can sometimes be arranged via a friend, the Linux newsgroups
- and mailing lists, or some other source of support, if you don't have the
- know-how yourself. I'd like to tell you more about it, but my lack of education
- and experience in this regard hinders me. The debugfs(8) program by Theodore
- T'so should be useful.
-
- fsck must only be run on unmounted filesystems, never on mounted filesystems
- (with the exception of the read-only root during startup). This is because it
- accesses the raw disk, and can therefore modify the filesystem without the
- operating system realizing it. There will be trouble, if the operating system is
- confused.
- 4.7. Disks without filesystems 47
-
-
-
- It can be a good idea to periodically check for bad blocks. This is done with
- the badblocks command. It outputs a list of the numbers of all bad blocks it can
- find. This list can be fed to fsck to be recorded in the filesystem data
- structures so that the operating system won't try to use the bad blocks for
- storing data. The following example will show how this could be done.
-
- ttyp6 root " $ badblocks /dev/fd0H1440 1440 > bad-blocks
- ttyp6 root " $ fsck -t ext2 -l bad-blocks /dev/fd0H1440
- Parallelizing fsck version 0.5a (5-Apr-94)
- e2fsck 0.5a, 5-Apr-94 for EXT2 FS 0.5, 94/03/10
- Pass 1: Checking inodes, blocks, and sizes
- Pass 2: Checking directory structure
- Pass 3: Checking directory connectivity
- Pass 4: Check reference counts.
- Pass 5: Checking group summary information.
-
- /dev/fd0H1440: ***** FILE SYSTEM WAS MODIFIED *****
- /dev/fd0H1440: 11/360 files, 63/1440 blocks
- ttyp6 root " $
-
- 4.7 Disks without filesystems
-
- Not all disks or partitions are used as filesystems. A swap partition, for
- example, will not have a filesystem on it. Many floppies are used in a
- tape-drive emulating fashion, so that a tar or other file is written directly on
- the raw disk, without a filesystem. This has the advantages of making more of
- the disk usable (a filesystem always has some bookkeeping overhead) and more
- easily compatible with other systems: the tar file format is the same on all
- systems, while filesystems are different on most systems. You will quickly get
- used to disks without filesystems if you need them. Bootable Linux floppies
- also do not necessarily have a filesystem, although that is also possible.
-
-
- One reason to use raw disks is to make image copies of them. For instance, if
- the disk contains a partially damaged filesystem, it is a good idea to make an
- exact copy of it before trying to fix it, since then you can start again if your
- fixing breaks things even more. One way to do this is to use dd(1):
-
- ttyp2 root /usr/tmp $ dd if=/dev/fd0H1440 of=floppy-image
-
- 2880+0 records in
- 48 Chapter 4. Using Disks and Other Storage Media
-
-
-
- 2880+0 records out
- ttyp2 root /usr/tmp $ dd if=floppy-image of=/dev/fd0H1440
- 2880+0 records in
- 2880+0 records out
- ttyp2 root /usr/tmp $
-
- The first dd makes an exact image of the floppy to the file floppy-image, the
- second one writes the image to the floppy. (The user has presumably switched
- the floppy before the second command. Otherwise the command pair is of doubtful
- usefulness.)
-
-
- 4.8 Allocating disk space
-
- 4.8.1 Partitioning schemes
-
- It is not easy to partition a disk in the best possible way. Worse, there is no
- universally correct way to do it; there are too many factors involved.
-
- The traditional way is to have a (relatively) small root filesystem, which
- contains /bin, /etc, /dev, /lib, /tmp, and other stuff that is needed to get the
- system up and running. This way, the root filesystem (in its own partition or
- on its own disk) is all that is needed to bring up the system. The reasoning is
- that if the root filesystem is small and is not heavily used, it is less likely
- to become corrupt when the system crashes, and you will therefore find it easier
- to fix any problems caused by the crash. Then you create separate partitions or
- use separate disks for the directory tree below /usr, the users' home
- directories (often under /home), and the swap space. Separating the home
- directories (with the users' files) in their own partition makes backups easier,
- since it is usually not necessary to backup programs (which reside below /usr).
- In a networked environment it is also possible to share /usr among several
- machines (e.g., by using NFS), thereby reducing the total disk space required by
- several tens or hundreds of megabytes times the number of machines.
-
- The problem with having many partitions is that it splits the total amount of
- free disk space into many small pieces. Nowadays, when disks and (hopefully)
- operating systems are more reliable, many people prefer to have just one
- partition that holds all their files. On the other hand, it can be less painful
- to back up (and restore) a small partition.
-
- For a small hard disk (assuming you don't do kernel development), the best
- way to go is probably to have just one partition. For large hard disks, it is
- probably better to have a few large partitions, just in case something does go
- wrong. (Note that `small'
- 4.8. Allocating disk space 49
-
-
-
- and `large' are used in a relative sense here; your needs for disk space decide
- what the threshold is.)
-
- If you have several disks, you might wish to have the root filesystem
- (including /usr) on one, and the users' home directories on another.
-
- It is a good idea to be prepared to experiment a bit with different
- partitioning schemes (over time, not just while first installing the system).
- This is a bit of work, since it essentially requires you to install the system
- from scratch several times, but it is the only way to be sure you do it right.
-
- 4.8.2 Space requirements
-
-
- The Linux distribution you install will give some indication of how much disk
- space you need for various configurations. Programs installed separately may
- also do the same. This will help you plan your disk space usage, but you should
- prepare for the future and reserve some extra space for things you will notice
- later that you need.
-
- The amount you need for user files depends on what your users wish to do.
- Most people seem to need as much space for their files as possible, but the
- amount they will live happily with varies a lot. Some people do only light text
- processing and will survive nicely with a few megabytes, others do heavy image
- processing and will need gigabytes.
- By the way, when comparing file sizes given in kilobytes or megabytes and
- disk space given in megabytes, it can be important to know that the two units
- can be different. Some disk manufacturers like to pretend that a kilobyte is
- 1000 bytes and a megabyte is 1000 kilobytes, while all the rest of the computing
- world uses 1024 for both factors. Therefore, my 345 MB hard disk is really a 330
- MB hard disk.8
-
-
- Swap space allocation is discusses in section 6.5.
-
- 4.8.3 Examples of hard disk allocation
-
-
- I used to have a 109 MB hard disk. Now I am using a 330 MB hard disk. I'll
- explain how and why I partitioned these disks.
- The 109 MB disk I partitioned in a lot of ways, when my needs and the
- operating systems I used changed; I'll explain two typical scenarios. First, I
- used to run MS- DOS together with Linux. For that, I needed about 20 MB of hard
- disk, or just
- _____________________________
- 8 Sic transit discus mundi.
- 50 Chapter 4. Using Disks and Other Storage Media
-
-
-
- enough to have MS-DOS, a C compiler, an editor, a few other utilities, the
- program I was working on, and enough free disk space to not feel claustrophobic.
- For Linux, I had a 10 MB swap partition, and the rest, or 79 MB, was a single
- partition with all the files I had under Linux. I experimented with having
- separate root, /usr, and /home partitions, but there was never enough free disk
- space in one piece to do much interesting.
-
- When I didn't need MS-DOS anymore, I repartitioned the disk so that I had a
- 12 MB swap partition, and again had the rest as a single filesystem.
-
- The 330 MB disk is partitioned into several partitions, like this:
-
- 5 MB root filesystem
- 10 MB swap partition
- 180 MB /usr filesystem
- 120 MB /home filesystem
- 15 MB scratch partition
-
-
- The scratch partition is for playing around with things that require their own
- par-tition, e.g., trying different Linux distributions, or comparing speeds of
- filesystems. When not needed for anything else, it is used as swap space (I like
- to have a lot of open windows).
-
- 4.8.4 Adding more disk space for Linux
-
- Adding more disk space for Linux is easy, at least after the hardware has been
- properly installed (the hardware installation is outside the scope of this
- book). You format it if necessary, then create the partitions and filesystem as
- described above, and add the proper lines to /etc/fstab so that it is mounted
- automatically.
-
- 4.8.5 Tips for saving disk space
-
- The best tip for saving disk space is to avoid installing unnecessary programs.
- Most Linux distributions have an option to install only part of the packages
- they contain, and by analyzing your needs you might notice that you don't need
- most of them. This will help save a lot of disk space, since many programs are
- quite large. Even if you do need a particular package or program, you might not
- need all of it. For example, some on-line documentation might be unnecessary, as
- might some of the Elisp files for GNU Emacs, some of the fonts for X11, or some
- of the libraries for programming.
- 4.8. Allocating disk space 51
-
- If you cannot uninstall packages, you might look into compression.
- Compression programs such as gzip(1) or zip(1) will compress (and uncompress)
- individual files or groups of files. The gzexe system will compress and
- uncompress programs invisibly to the user (unused programs are compressed, then
- uncompressed as they are used). The experimental DouBle system will compress all
- files in a filesystem, invisibly to the programs that use them. (If you are
- familiar with products such as Stacker for MS-DOS, the principle is the same.)
- 52 Chapter 4. Using Disks and Other Storage Media
-
-
-
-
-
- Chapter 5
-
-
-
- Directory Tree Overview
- This chapter needs a quote. Suggestions, anyone?
-
- This chapter describes the important parts of a standard Linux directory tree,
- based on the FSSTND filesystem standard. It outlines the normal way of breaking
- the di-rectory tree into separate filesystems with different purposes and gives
- the motivation behind this particular split. Some alternative ways of splitting
- are also described.
-
-
- META: The next version of the FSSTND (1.3?) will cause many minor changes,
- and some new ones, due to work to make the FSSTND work for BSD systems as well.
-
-
- 5.1 Background
- This chapter is loosely based on the Linux filesystem standard, FSSTND, version
- 1.2 (see the bibliography), which attempts to set a standard for how the
- directory tree in a Linux system is organized. Such a standard has the advantage
- that it will be easier to write or port software for Linux, and to administer
- Linux machines, since everything will be in their usual places. There is no
- authority behind the standard that forces anyone to comply to it, but it has got
- the support of most, if not all Linux distributions. It is not a good idea to
- break with the FSSTND without very compelling reasons. The FSSTND attempts to
- follow Unix tradition and current trends, making Linux systems familiar to those
- with experience with other Unix systems, and vice versa.
-
- This chapter is not as detailed as the FSSTND. A system administrator should
- also read the FSSTND for a complete understanding.
-
-
- 53
- 54 Chapter 5. Directory Tree Overview
-
-
- This chapter does not explain all files in detail. The intention is not to
- describe every file, but to give an overview of the system from a filesystem
- point of view. Further information of each file is available elsewhere in this
- manual or the manual pages.
-
- The full directory tree is intended to be breakable into smaller parts, each
- on its own disk or partition, to accomodate to disk size limits and to ease
- backup and other system administration. The major parts are the root, /usr,
- /var, and /home filesystems. Each part has a different purpose. The directory
- tree has been designed so that it works well in a network of Linux machines
- which may share some parts of the filesystems over a read-only device (e.g., a
- CD-ROM), or over the network with NFS.
-
- The roles of the different parts of the directory tree are described below.
-
- oThe root filesystem is specific for each machine (it is generally stored on
- a local disk, although it could possibly be downloaded to a ramdisk during
- bootup) and contains the files that are necessary for booting the system up,
- and to bring it up to such a state that the other filesystems may be mounted.
- The contents of the root filesystem will therefore be sufficient for the
- single user state. It will also contain tools for fixing a broken system, and
- for recovering lost files from backups.
-
-
- oThe /usr filesystem contains all commands, libraries, manual pages, and
- other unchanging files needed during normal operation. No files in /usr
- should be specific for any given machine, nor should they be modified during
- normal use. This allows the files to be shared over the network, which can be
- cost-effective since it saves disk space (there can easily be hundreds of
- megabytes in /usr), and can make administration easier (only the master /usr
- needs to be changed when updating an application, not each machine
- separately). Even if the filesystem is on a local disk, it could be mounted
- read-only, to lessen the chance of filesystem corruption during a crash.
-
-
- oThe /var filesystem contains files that change, such as spool directories
- (for mail, news, printers, etc), log files, formatted manual pages, and
- temporary files. Traditionally everything in /var has been somewhere below
- /usr, but that made it impossible to mount /usr read-only.
-
-
- oThe /home filesystem contains the users' home directories, i.e., all the
- real data on the system. Separating home directories to their own directory
- tree or filesystem makes backups easier; the other parts often do not have to
- be backed up, or at
- 5.2. The root filesystem 55
-
-
-
- least not as often (they seldom change). A big /home might have to be broken
- on several filesystems, which requires adding an extra naming level below
- /home, e.g., /home/students and /home/staff.
-
-
-
- Although the different parts have been called filesystems above, there is no
- require- ment that they actually be on separate filesystems. They could easily
- be kept in a single one if the system is a small single-user system and he wants
- to keep things sim- ple. The directory tree might also be divided into
- filesystems differently, depending on how large the disks are, and how space is
- allocated for various purposes. The impor tant part, though, is that all the
- standard names work; even if, say, /var and /usr are actually on the same
- partition, the names /usr/lib/libc.a and /var/adm/messages must work, for
- example by moving files below /var into /usr/var, and making /var a symlink to
- /usr/var.
-
-
- The Unix filesystem structure groups files according to purpose, i.e., all
- commands are in one place, all data files in another, documentation in a third,
- and so on. An alternative would be to group files files according to the program
- they belong to, i.e., all Emacs files would be in one directory, all TEX in
- another, and so on. The problem with the latter approach is that it makes it
- difficult to share files (the program directory often contains both static and
- shareable and changing and non-shareable files), and sometimes to even find the
- files (e.g., manual pages in a huge number of places, and making the manual page
- programs find all of them is a maintenance nightmare).
-
- 5.2 The root filesystem
- The root filesystem should generally be small, since it contains very critical
- files and a small, infrequently modified filesystem has a better chance of not
- getting corrupted. A corrupted root filesystem will generally mean that the
- system becomes unbootable except with special measures (e.g., from a floppy), so
- you don't want to risk it.
-
- The root directory generally doesn't contain any files, except perhaps the
- standard boot image for the system, usually called /vmlinuz. All other files are
- in subdirecto- ries in the root filesystems:
-
- /bin Commands needed during bootup that might be used by normal users
- (probably after bootup).
-
- /sbin Like /bin, but the commands are not intended for normal users,
- al-
- 56 Chapter 5. Directory Tree Overview
-
-
-
- though they may use them if necessary and allowed.
-
-
- /etc Configuration files specific to the machine.
-
- /root The home directory for user root.
-
- /lib Shared libraries needed by the programs on the root filesystem.
-
- /lib/modules Loadable kernel modules, especially those that are needed to boot
- the system when recovering from disasters (e.g., network and
- filesystem drivers).
-
- /dev Device files.
-
- /tmp Temporary files. Programs running after bootup should use
- /var/tmp, not /tmp, since the former is probably on a disk with
- more space.
-
- /boot Files used by the bootstrap loader, e.g., LILO. Kernel images are
- often kept here instead of in the root directory. If there are
- many kernel images, the directory can easily grow rather big, and
- it might be better to keep it in a separate filesystem. Another
- reason would be to make sure the kernel images are within the
- first 1024 cylinders of an IDE disk.
-
- /mnt Mount point for temporary mounts by the system administrator.
- Pro-grams aren't supposed to mount on /mnt automatically. /mnt
- might be divided into subdirectories (e.g., /mnt/dosa might be
- the floppy drive using an MS-DOS filesystem, and /mnt/exta might
- be the same with an ext2 filesystem).
-
-
- /proc,/usr,/var, /homeMount points for the other filesystems.
-
- 5.2.1 The /etc directory
-
-
-
- The /etc directory contains a lot of files. Some of them are described below.
- For others, you should determine which program they belong to and read the
- manual page for that program. Many networking configuration files are in /etc as
- well, and are described in the Networking Administrators' Guide.
-
- /etc/rc or /etc/rc.d or /etc/rc?.d
- Scripts or directories of scripts to run at startup or when changing the run
- level. See the chapter on init for further
- 5.2. The root filesystem 57
-
-
-
- information.
-
-
- /etc/passwd
- The user database, with fields giving the username, real name, home
- directory, encrypted password, and other information about each user. The
- format is documented in the passwd(5) manual page.
-
-
- /etc/fdprm
- Floppy disk parameter table. Describes what different floppy disk formats
- look like. Used by setfdprm(1). See the setfdprm(8) manual page for more
- information.
-
-
- /etc/fstab
- Lists the filesystems mounted automatically at startup by the mount -a
- command (in /etc/rc or equivalent startup file). Under Linux, also contains
- information about swap areas used automatically by swapon -a. See section
- 4.6.5 and the mount(8) manual page for more informa-tion.
-
-
- /etc/group
- Similar to /etc/passwd, but describes groups instead of users. See the
- group(5) manual page for more information.
-
-
- /etc/inittab
- Configuration file for init(8).
-
-
- /etc/issue
- Output by getty before the login prompt. Usually contains a short
- description or welcoming message to the system. The contents are up to the
- system administrator.
-
-
- /etc/magic
- The configuration file for file(1). Contains the descriptions of various file
- formats based on which file guesses the type of the file. See the magic(8)
- and file(1) manual pages for more information.
-
-
- /etc/motd
- The message of the day, automatically output after a successful login.
- Contents are up to the system administrator. Often used for getting
- information to every user, such as warnings about planned downtimes.
-
-
- /etc/mtab
- List of currently mounted filesystems. Initially set up by the scripts, and
- updated automatically by the mount command. Used when a list of mounted
- filesystems is needed, e.g., by the df(1) command.
-
-
- /etc/shadow
- Shadow password file on systems with shadow password software installed.
- Shadow passwords move the encrypted password from /etc/passwd into
- /etc/shadow; the latter is not readable by anyone except root. This makes it
- harder to crack passwords.
- 58 Chapter 5. Directory Tree Overview
-
-
-
- /etc/login.defs
- Configuration file for the login(1) command.
-
-
- /etc/printcap
- Like /etc/termcap, but intended for printers. Different syntax.
-
-
- /etc/profile, /etc/csh.login, /etc/csh.cshrc
- Files executed at login or startup time by the Bourne or C shells. These
- allow the system administrator to set global defaults for all users. See the
- manual pages for the respective shells.
-
-
- /etc/securetty
- Identifies secure terminals, i.e., the terminals from which root is allowed
- to log in. Typically only the virtual consoles are listed, so that it becomes
- impossible (or at least harder) to gain superuser privileges by breaking into
- a system over a modem or a network.
-
-
- /etc/shells
- Lists trusted shells. The chsh(1) command allows users to change their login
- shell only to shells listed in this file. ftpd, the server process that
- provides FTP services for a machine, will check that the user's shell is
- listed in /etc/shells and will not let people log in unles the shell is
- listed there.
-
-
- /etc/termcap
- The terminal capability database. Describes by what \escape sequences"
- various terminals can be controlled. Programs are written so that instead of
- directly outputting an escape sequence that only works on a particular brand
- of terminal, they look up the correct sequence to do whatever it is they want
- to do in /etc/termcap. As a result most programs work with most kinds of
- terminals. See the termcap(5), curs_termcap(3), and terminfo(5) manual pages
- for more information.
-
-
- META: HOSTNAME, adjtime, disktab, gettydefs, networking (exports, host.conf,
- hosts, hosts.equiv, inetd.conf, named.*, networks, ntp.conf, protocols,
- resolv.conf, rpc, services, syslog.conf), mtools, and so forth.
-
- 5.2.2 The /dev directory
-
-
-
- The /dev directory contains the special device files for all the devices. The
- device files are named using special conventions; these are described in
- appendix C. The device files are created during installation, and later with the
- /dev/MAKEDEV script. The /dev/MAKEDEV.local is a script written by the system
- administrator that creates local-only device files or links (i.e., those that
- are not part of the standard MAKEDEV,
- 5.3. The /usr filesystem 59
-
-
-
- such as device files for some non-standard device driver).
-
-
-
- 5.3 The /usr filesystem
-
- The /usr filesystem is often large, since all programs are installed there. All
- files in /usr usually come from a Linux distribution; locally installed programs
- and other stuff goes below /usr/local. This makes it possible to update the
- system from a new version of the distribution, or even a completely new
- distribution, without having to install all programs again. Some of the
- subdirectories of /usr are listed below (some of the less important directories
- have been dropped; see the FSSTND for more information).
-
- /usr/X11R6
- The X Window System, all files. To simplify the development and installation
- of X, the X files have not been integrated into the rest of the system. There
- is a directory tree below /usr/X11R6 similar to that below /usr itself.
-
-
- /usr/X386
- Similar to /usr/X11R6, but for X11 Release 5.
-
-
- /usr/bin
- Almost all user commands. Some commands are in /bin or in /usr/local/bin.
-
-
- /usr/sbin
- System administration commands that are not needed on the root filesystem,
- e.g., most server programs.
-
-
- /usr/man, /usr/info, /usr/doc
- Manual pages, GNU Info documents, and miscellaneous other documentation
- files, respectively.
-
-
- /usr/include
- Header files for the C programming language. This should actually be below
- /usr/lib for consistency, but the tradition is overwhelmingly in support for
- this name.
-
-
- /usr/lib
- Unchanging data files for programs and subsystems, including some site-wide
- configuration files. The name lib comes from library; originally libraries of
- programming subroutines were stored in /usr/lib.
-
-
- /usr/local
- The place for locally installed software and other files.
- 60 Chapter 5. Directory Tree Overview
-
-
-
- 5.4 The /var filesystem
-
-
- The /var contains data that is changed when the system is running normally. It
- isspecific for each system, i.e., not shared over the network with other
- computers.
-
- /var/catman
- A cache for man pages that are formatted on demand. The source for manual
- pages is usually stored in /usr/man/man*; some manual pages might come with a
- pre-formatted version, which is stored in /usr/man/cat*. Other manual pages
- need to be formatted when they are first viewed; the formatted version is
- then stored in /var/man so that the next person to view the same page won't
- have to wait for it to be formatted. (/var/catman is often cleaned in the
- same way temporary directories are cleaned.)
-
-
- /var/lib
- Files that change while the system is running normally.
-
-
- /var/local
- Variable data for programs that are installed in /usr/local (i.e., programs
- that have been installed by the system administrator). Note that even locally
- installed programs should use the other /var directories if they are
- appropriate, e.g., /var/lock.
-
-
- /var/lock
- Lock files. Many programs follow a convention to create a lock file in
- /var/lock to indicate that they are using a particular device or file. Other
- programs will notice the lock file and won't attempt to use the device or
- file.
-
-
- /var/log
- Log files from various programs, especially login (/var/log/wtmp, which logs
- all logins ans logouts into the system) and syslog (/var/log/messages, where
- all kernel and system program message are usually stored). File in /var/log
- can often grow indefinitely, and may require cleaning at regular intervals.
-
-
- /var/run
- Files that contain information about the system that is valid until the
- system is next booted. For example, /var/run/utmp contains information about
- people currently logged in.
-
-
- /var/spool
- Directories for mail, news, printer queues, and other queued work. Each
- different spool has its own subdirectory below /var/spool, e.g., the
- mailboxes of the users are in /var/spool/mail.
- 5.5. The /proc filesystem 61
-
-
-
- /var/tmp
- Temporary files that are large or that need to exist for a longer time than
- what is allowed for /tmp. (Although the system administrator might not allow
- very old files in /var/tmp either.)
-
- 5.5 The /proc filesystem
-
- The /proc filesystem contains a illusionary filesystem. It does not exist on a
- disk. Instead, the kernel creates it in memory. It is used to provide
- information about the system (originally about processes, hence the name). Some
- of the more important files and directories are explained below. The /proc
- filesystem is described in more detail in the proc(5) manual page.
-
-
-
- /proc/1
- A directory with information about process number 1. Each process has a
- directory below /proc with the name being its process identification number.
-
-
- /proc/cpuinfo
- Information about the processor, such as its type, make, model, and
- perfomance.
-
-
- /proc/devices
- List of device drivers configured into the currently running kernel.
-
-
- /proc/dma
- Shows which DMA channels are being used at the moment.
-
-
- /proc/filesystems
- Filesystems configured into the kernel.
-
-
- /proc/interrupts
- Shows which interrupts are in use, and how many of each there have been.
-
-
- /proc/ioports
- Which I/O ports are in use at the moment.
-
-
- /proc/kcore
- An image of the physical memory of the system. This is exactly the same size
- as your physical memory, but does not really take up that much memory; it is
- generated on the fly as programs access it. (Remember: unless you copy it
- elsewhere, nothing under /proc takes upany disk space at all.)
-
-
- /proc/kmsg
- Messages output by the kernel. These are also routed to syslog.
-
-
- /proc/ksyms
- Symbol table for the kernel.
- 62 Chapter 5. Directory Tree Overview
-
-
-
- /proc/loadavg
- The `load average' of the system; three meaningless indicators of how much
- work the system has to do at the moment.
-
-
- /proc/meminfo
- Information about memory usage, both physical and swap.
-
-
- /proc/modules
- Which kernel modules are loaded at the moment.
-
-
- /proc/net
- Status information about network protocols.
-
-
- /proc/self
- A symbolic link to the process directory of the program that is looking at
- /proc. When two processes look at /proc, they get different links. This is
- mainly a convenience to make it easier for programs to get at their process
- directory.
-
-
- /proc/stat
- Various statistics about the system, such as the number of page faults since
- the system was booted.
-
-
- /proc/uptime
- The time the system has been up.
-
-
- /proc/version
- The kernel version.
-
-
-
- Note that while the above files tend to be easily readable text files, they can
- sometimes be formatted in a way that is not easily digestable. There are many
- commands that do little more than read the above files and format them for
- easier understanding. For example, the free program reads /proc/meminfo and
- converts the amounts given in bytes to kilobytes (and adds a little more
- information, as well).
-
-
- Chapter 6
-
-
-
- Memory Management
-
-
-
- Minnet, jag har tappat mitt minne,
- "ar jag svensk eller finne
- kommer inte ihag: : :
-
- Inne, "ar jag ute eller inne
- jag har luckor i minnet,
- sad"ar sma ALKO-HAL
- Men besinne,
- man t"atar med det br"annvin man far,
- fast"an minnet och helan gar.
-
- (Bosse "Osterberg)
-
- This section describes the Linux memory management features, i.e., virtual
- memory and the disk buffer cache. The purpose and workings and the things the
- system administrator needs to take into consideration are described.
-
- 6.1 What is virtual memory?
-
-
-
- Linux supports virtual memory, that is, using a disk as an extension of RAM so
- that the effective size of usable memory grows correspondingly. The kernel will
- write the contents of a currently unused block of memory to the hard disk so
- that the memory can be used for another purpose. When the original contents are
- needed again, they are read back into memory. This is all made completely
- transparent
-
-
- 63
- 64 Chapter 6. Memory Management
-
-
-
- to the user; programs running under Linux only see the larger amount of memory
- available and don't notice that parts of them reside on the disk from time to
- time. Of course, reading and writing the hard disk is slower (on the order of a
- thousand times slower) than using real memory, so the programs don't run as
- fast. The part of the hard disk that is used as virtual memory is called the
- swap space.
-
-
- Linux can use either a normal file in the filesystem or a separate partition
- for swap space. A swap partition is faster, but it is easier to change the size
- of a swap file (there's no need to repartition the whole hard disk, and possibly
- install everything from scratch). When you know how much swap space you need,
- you should go for a swap partition, but if you are uncertain, you can use a swap
- file first, use the system for a while so that you can get a feel for how much
- swap you need, and then make a swap partition when you're confident about its
- size.
-
-
- You should also know that Linux allows one to use several swap partitions
- and/or swap files at the same time. This means that if you only occasionally
- need an unusual amount of swap space, you can set up an extra swap file at such
- times, instead of keeping the whole amount allocated all the time.
-
- 6.2 Creating a swap area
-
-
-
- A swap file is an ordinary file; it is in no way special to the kernel. The only
- thing that matters to the kernel is that it has no holes, and that it is
- prepared for use with mkswap(8). It must reside on a local disk, however; it
- can't reside in a filesystem that has been mounted over NFS.
-
-
- The bit about holes is important. The swap file reserves the disk space so
- that the kernel can quickly swap out a page without having to go through all the
- things that are necessary when allocating a disk sector to a file. The kernel
- merely uses any sectors that have already been allocated to the file. Because a
- hole in a file means that there are no disk sectors allocated (for that place in
- the file), it is not good for the kernel to try to use them.
-
-
- One good way to create the swap file without holes is through the following
- com- mand:
-
-
-
- ttyp5 root " $ dd if=/dev/zero of=/extra-swap bs=1024 count=1024
-
- 1024+0 records in
-
- 1024+0 records out
-
- ttyp5 root " $
- 6.3. Using a swap area 65
-
-
-
- where /extra-swap is the name of the swap file and the size of is given after
- the count=. It is best for the size to be a multiple of 4, because the kernel
- writes out memory pages, which are 4 kilobytes in size. If the size is not a
- multiple of 4, the last couple of kilobytes may be unused.
-
-
- A swap partition is also not special in any way. You create it just like any
- other partition; the only difference is that it is used as a raw partition, that
- is, it will not contain any filesystem at all. It is a good idea to mark swap
- partitions as type 82 (Linux swap); this will the make partition listings
- clearer, even though it is not strictly necessary to the kernel.
-
-
- After you have created a swap file or a swap partition, you need to write a
- signature to its beginning; this contains some administrative information and is
- used by the kernel. The command to do this is mkswap(8), used like this:
-
-
-
- ttyp5 root " $ mkswap /extra-swap 1024
-
- Setting up swapspace, size = 1044480 bytes
-
- ttyp5 root " $
-
-
-
- Note that the swap space is still not in use yet: it exists, but the kernel does
- not use it to provide virtual memory.
-
-
- The Linux memory manager limits the size of each swap area to 127.5 MB. A
- larger swap space can be created, but only the first 127.5 MB are actually used.
- You can, however, use up to 16 swap spaces simultaneously, for a total of almost
- 2 GB.1
-
- 6.3 Using a swap area
-
-
-
- An initialized swap area is taken into use with swapon(8). This command tells
- the kernel that the swap area can be used. The path to the swap area is given
- as the argument, so to start swapping on a temporary swap file one might use the
- following command.
-
- swapon /usr/tmp/temporary-swap-file ttyp5 root " $ swapon /extra-swap
- ttyp5 root " $
-
- Swap areas can be used automatically by listing them in the /etc/fstab file.
-
-
- /dev/hda8 swap swap defaults
- _____________________________1
- A gigabyte here, a gigabyte there, pretty soon we start talking about real
- memory.
- 66 Chapter 6. Memory Management
-
-
-
- The startup scripts will run the command swapon -a, which will start swapping on
- all the swap areas listed in /etc/fstab. Therefore, the swapon command is
- usually used only when extra swap is needed.
-
-
-
- You can monitor the use of swap areas with free(1). It will tell the total
- amount of swap space used. The same information is available via top(1), or
- using the proc filesystem in file /proc/meminfo. It is currently difficult to
- get information on the use of a specific swap area.
-
-
-
- A swap area can be removed from use with swapoff(8). It is usually not
- necessary to do it, except for temporary swap areas. Any pages in use in the
- swap area are swapped in first; if there is not sufficient physical memory to
- hold them, they will then be swapped out (to some other swap area). If there is
- not enough virtual memory to hold all of the pages Linux will start to trash;
- after a long while it should recover, but meanwhile the system is unusable. You
- should check (e.g., with free) that there is enough free memory before removing
- a swap space from use.
-
-
-
- All the swap areas that are used automatically with swapon -a can be removed
- from use with swapoff -a; it looks at the file /etc/fstab to find what to
- remove. Any manually used swap areas will remain in use.
-
-
-
- Sometimes a lot of swap space can be in use even though there is a lot of
- free physical memory. This can happen for instance if at one point there is need
- to swap, but later a big process that occupied much of the physical memory
- terminates and frees the memory. The swapped-out data is not automatically
- swapped in until it is needed, so the physical memory may remain free for a long
- time. There is no need to worry about this, but it can be comforting to know
- what is happening.
-
-
-
- 6.4 Sharing swap areas with other operating systems
-
- Virtual memory is built into many operating systems. Since they each need it
- only when they are running, i.e., never at the same time, the swap areas of all
- but the currently running one are being wasted. It would be more efficient for
- them to share a single swap area. This is possible, but can require a bit of
- hacking. The Tips-HOWTO contains some advice on how to implement this.
- 6.5. Allocating swap space 67
-
-
-
- 6.5 Allocating swap space
-
-
-
- Some people will tell you that you should allocate twice as much swap space as
- you have physical memory, but this is a bogus rule. Here's how to do it
- properly:
-
-
- 1.Estimate your total memory needs. This is the largest amount of memory
- you'll probably need at a time, that is the sum of the memory requirements of
- all the programs you want to run at the same time. This can be done by running
- at the same time all the programs you are likely to ever be running at the same
- time. For instance, if you want to run X, you should allocate about 8 MB for it,
- gcc wants several megabytes (some files need an unusually large amount, up to
- several tens of megabytes, but usually about four should do), and so on. The
- kernel will use about a megabyte by itself, and the usual shells and other small
- utilities perhaps a few hundred kilobytes (say a megabyte together). There is no
- need to try to be exact, rough estimates are fine, but you might want to be on
- the pessimistic side. Remember that if there are going to be several people
- using the system at the same time, they are all going to consume memory.
- (However, if two people run the same program at the same time, the total memory
- consumption is usually not double, since code pages and shared libraries exist
- only once.) The free(8) and ps(1) commands are useful for estimating the memory
- needs.
-
-
- 2.Add some security to the estimate in step 1. This is because estimates of
- program sizes will probably be wrong, because you'll probably forget some
- programs you want to run, and to make certain that you have some extra space
- just in case. A couple of megabytes should be fine. (It is better to allocate
- too much than too little swap space, but there's no need to over-do it and
- allocate the whole disk, since unused swap space is wasted space; see later
- about adding more swap.) Also, since it is nicer to deal with even numbers, you
- can round the value up to the next full megabyte.
-
-
- 3.Based on the computations in steps 1 and 2, you know how much memory you'll
- be needing in total. So, in order to allocate swap space, you just need to
- subtract the size of your physical memory from the total memory needed, and you
- know how much swap space you need. (On some versions of UNIX, you need to
- allocate space for an image of the physical memory as well, so the amount
- computed in step 2 is what you need and you shouldn't do the subtraction.)
-
-
- 4.If your calculated swap area is very much larger than your physical memory
- (more than a couple times larger), you should probably invest in more physical
- memory, otherwise performance will be too low.
- 68 Chapter 6. Memory Management
-
-
-
- 6.6 The buffer cache
-
-
-
- Reading from a disk2 is very slow compared to accessing (real) memory. In
- addition, it is common to read the same part of a disk several times during
- relatively short periods of time. For example, one might first read an e-mail
- message, then read the letter into an editor when replying to it, then make the
- mail program read it again when copying it to a folder. Or, consider how often
- the command ls might be run on a system with many users. By reading the
- information from disk only once and then keeping it in memory until no longer
- needed, one can speed up all but the first read. This is called disk buffering,
- and the memory used for the purpose is called the buffer cache.
-
-
- Since memory is, unfortunately, a finite, nay, scarce resource, the buffer
- cache usually cannot be big enough (it can't hold all the data one ever wants to
- use). When the cache fills up, the data that has been unused for the longest
- time is discarded and the memory thus freed is used for the new data.
-
-
- Disk buffering works for writes as well. On the one hand, data that is
- written is often soon read again (e.g., a source code file is saved to a file,
- then read by the compiler), so putting data that is written in the cache is a
- good idea. On the other hand, by only putting the data into the cache, not
- writing it to disk at once, the program that writes runs quicker. The writes can
- then be done in the background, without slowing down the other programs.
-
-
- Most operating systems have buffer caches (although they might be called
- some- thing else), but not all of them work according to the above principles.
- Some are write-through: the data is written to disk at once (it is kept in the
- cache as well, of course). The cache is called write-back if the writes are
- done at a later time. Write-back is more efficient than write-through, but also
- a bit more prone to errors: if the machine crashes, or the power is cut at a bad
- moment, or the floppy is removed from the disk drive before the data in the
- cache waiting to be written gets written, the changes in the cache are usually
- lost. This might even mean that the filesystem (if there is one) is not in full
- working order, perhaps because the unwritten data held important changes to the
- bookkeeping information. Because of this, you should never turn off the power
- without using a proper shutdown procedure (see an as yet unwrit- ten chapter),
- or remove a floppy from the disk drive until it has been unmounted (if it was
- mounted) or after whatever program is using it has signaled that it is finished
- and the floppy drive light doesn't shine anymore. The sync(8) command flushes
- the buffer, i.e., forces all unwritten data to be written to disk, and can be
- used when
- _____________________________2
- Except a RAM disk, for obvious reasons.
- 6.6. The buffer cache 69
-
-
-
- one wants to be sure that everything is safely written. In traditional UNIX
- systems, there is a program running in the background which does a sync every 30
- seconds, so it is usually not necessary to use sync. Linux has an additional
- daemon, bdflush(8), that does a more imperfect sync more frequently to avoid the
- sudden freeze due to heavy disk I/O that sync sometimes causes.
-
-
- The cache does not actually buffer files, but blocks, which are the smallest
- units of disk I/O (under Linux, they are usually 1 kB). This way, also
- directories, super blocks, other filesystem bookkeeping data, and non-filesystem
- disks are cached.
-
-
- The effectiveness of a cache is primarily decided by its size. A small cache
- is next to useless: it will hold so little data that all all cached data is
- flushed from the cache before it is reused. The critical size depends on how
- much data is read and written, and how often the same data is accessed. The only
- way to know is to experiment.
-
-
- If the cache is of a fixed size, it is not very good to have it too big,
- either, because that might make the free memory too small and cause swapping
- (which is also slow). To make the most efficient use of real memory, Linux
- automatically uses all free RAM for buffer cache, but also automatically makes
- the cache smaller when programs need more memory.
-
-
- Under Linux, you do not need to do anything to make use of the cache, it
- happens completely automatically. Except for following the proper procedures for
- shutdown and removing floppies, you do not need to worry about it.
- 70 Chapter 6. Memory Management
-
-
-
-
-
- Chapter 7
-
-
-
- Logging In And Out
- This chapter needs a quote. Suggestions, anyone?
-
- This section describes what happens when a user logs in or out. The various
- inter- actions of background processes, log files, configuration files, and so
- on are described in some detail.
-
- 7.1 Logins via terminals
-
- Figure 7.1 shows how logins happen via terminals. First, init makes sure there
- is a getty program for the terminal connection (or console). getty listens at
- the terminal and waits for the user to notify that he is ready to login in (this
- usually means that the user must type something). When it notices a user, getty
- outputs a welcome message (stored in /etc/issue), and prompts for the username,
- and finally runs the login program. login gets the username as a parameter, and
- prompts the user for the password. If these match, login starts the shell
- configured for the user; else it just exits and terminates the process (perhaps
- after giving the user another chance at entering the username and password).
- init notices that the process terminated, and starts a new getty for the
- terminal.
-
- Note that the only process new process is created by init (using the fork(2)
- system call); getty and login only replace the program running in the process
- (using the exec(3) system call).
-
- A separate program for noticing the user is needed for serial lines, since it
- can be (and traditionally was) complicated to notice when a terminal becomes
- active. getty
-
- 71
-
- 72 Chapter 7. Logging In And Out
-
- also adapts to the speed and other settings of the connection, which is
- important especially for dial-in connections, where these parameters may change
- from call to call.
-
- There are several versions of getty and init in use, all with their good and
- bad points. It is a good idea to learn about the versions on your system, and
- also about the other versions (you could use the Linux Software Map to search
- them). If you don't have dial-in's, you probably don't have to worry about
- getty, but init is still important. 7.2 Logins via the network Two computers
- in the same network are usually linked via a single physical cable. When they
- communicate over the network, the programs in each computer that take part in
- the communication are linked via a virtual connection, a sort of imaginary
- cable. As far as the programs at either end of the virtual connection are
- concerned, they have a monopoly on their own cable. However, since the cable is
- not real, only imaginary, the operating systems of both computers can have
- several virtual con- nections share the same physical cable. This way, using
- just a single cable, several programs can communicate without having to know of
- or care about the other com- munications. It is even possible to have several
- computers use the same cable; the virtual connections exist between two
- computers, and the other computers ignore those connections that they don't take
- part in.
-
- That's a complicated and over-abstracted description of the reality. It
- might, however, be good enough to understand the important reason why network
- logins are somewhat different from normal logins. The virtual connections are
- established when there are two programs on different computers that wish to
- communicate. Since it is in principle possible to login from any computer in a
- network to any other computer, there is a huge number of potential virtual
- communications. Because of this, it is not practical to start a getty for each
- potential login.
-
- There is a single process corresponding to getty that handles all network
- logins. When it notices an incoming network login (i.e., it notices that it gets
- a new virtual connection to some other computer), it starts a new process to
- handle that single login. The original process remains and continues to listen
- for new logins.
-
- To make things a bit more complicated, there is more than one communication
- protocol for network logins. The two most important ones are telnet and rlogin.
- In addition to logins, there are many other virtual connections that may be made
- (for
-
- 7.3. What login does 73
-
- FTP, Gopher, HTTP, and other network services). It would be ineffective to
- have a separate process listening for a particular type of connection, so
- instead there is only one listener that can recognize the type of the connection
- and can start the correct type of program to provide the service. This single
- listener is called inetd; see the \Linux Network Administrators' Guide" for more
- information.
-
- 7.3 What login does
-
- The login program takes care of authenticating the user (making sure that the
- user- name and password match), and of setting up an initial environment for the
- user by setting permissions for the serial line and starting the shell.
-
- Part of the initial setup is outputting the contents of the file /etc/motd
- (short for message of the day) and checking for electronic mail. These can be
- disabled by creating a file called .hushlogin in the user's home directory.
-
- If the file /etc/nologin exists, logins are disabled. That file is typically
- created by shutdown(8) and relatives. login checks for this file, and will
- refuse to accept a login if it exists. If it does exist, login outputs it
- contents to the terminal before it quits.
-
- login logs all failed login attempts in a system log file (via syslog). It
- also logs all logins by root. Both of these can be useful when tracking down
- intruders.
-
- Currently logged in people are listed in /var/run/utmp. This file is valid
- only until the system is next rebooted or shut down; it is cleared when the
- system is booted. It lists each user and the terminal (or network connection) he
- is using, along with some other useful information. The who, w, and other
- similar commands look in utmp to see who are logged in.
-
- All successful logins are recorded into /var/log/wtmp. This file will grow
- without limit, so it must be cleaned regularly, for example by having a weekly
- cron job to clear it.1 The last command browses wtmp.
-
- Both utmp and wtmp are in a binary format (see the utmp(5) manual page); it
- is unfortunately not convenient to examine them without special programs.
-
- _____________________________1
-
- Good Linux distributions do this out of the box.
-
- 74 Chapter 7. Logging In And Out 7.4
-
- X and xdm
-
- META: X implements logins via xdm; also: xterm -ls
-
- 7.5 Access control
-
- The user database is traditionally contained in the /etc/passwd file. Some
- systems use shadow passwords, and have moved the passwords to /etc/shadow. Sites
- with many computers that share the accounts use NIS or some other method to
- store the user database; they might also automatically copy the database from
- one central location to all other computers.
-
- The user database contains not only the passwords, but also some additional
- infor- mation about the users, such as their real names, home directories, and
- login shells. This other information needs to be public, so that anyone can read
- it. Therefore the password is stored encrypted. This does have the drawback that
- anyone with access to the encrypted password can use various cryptographical
- methods to guess it, with- out trying to actually log into the computer. Shadow
- passwords try to avoid this by moving the password into another file, which only
- root can read (the password is still stored encrypted). However, installing
- shadow passwords later onto a system that did not support them can be difficult.
-
-
- With or without passwords, it is important to make sure that all passwords in
- a system are good, i.e., not easily guessable. The crack program can be used to
- crack passwords; any password it can find is by definition not a good one.
- While crack can be run be intruders, it can also be run by the system
- adminstrator to avoid bad passwords. Good passwords can also be enforced by the
- passwd(1) program; this is in fact more effective in CPU cycles, since cracking
- passwords requires quite a lot of computation.
-
- The user group database is kept in /etc/group; for systems with shadow pass-
- words, there can be a /etc/shadow.group.
-
- root usually can't login via most terminals or the network, only via
- terminals listed in the /etc/securetty file. This makes it necessary to get
- physical access to one of these terminals. It is, however, possible to log in
- via any terminal as any other user, and use the su command to become root.
-
- 7.6. Shell startup 75
-
- 7.6 Shell startup When an interactive login shell starts, it automatically
- executes one or more pre- defined files. Different shells execute different
- files; see the documentation of each shell for further information.
-
- Most shells first run some global file, for example, the Bourne shell
- (/bin/sh) and its derivatives execute /etc/profile; in addition, they execute
- "/.profile. /etc/profile allows the system administrator to have set up a common
- user en- vironment, especially by setting the PATH to include local command
- directories in addition to the normal ones. On the other hand, "/.profile allows
- the user to cus- tomize the environment to his own tastes by overriding, if
- necessary, the default environment.
-
- 76 Chapter 7. Logging In And Out
-
- Figure 7.1: Logins via terminals: the interaction of init, getty, login, and
- the shell (here,
-
- /bin/sh).
-
- Appendix A
-
-
-
- Design and Implementation of the Second Extended Filesystem This
- appendix is a paper written by R'emy Card (card@masi.ibp.fr), Theodore Ts'o
- (tytso@mit.edu), and Stephen Tweedie (sct@dcs.ed.ac.uk), the designers and
- implementors of the ext2 filesystem. It was first published in the Proceedings
- of the First Dutch International Symposium on Linux, ISBN 90 367 0385 9.
- Introduction Linux is a Unix-like operating system, which runs on PC-386
- computers. It was implemented first as extension to the Minix operating system
- [9] and its first versions included support for the Minix filesystem only. The
- Minix filesystem contains two serious limitations: block addresses are stored in
- 16 bit integers, thus the maximal filesystem size is restricted to 64 mega
- bytes, and directories contain fixed-size entries and the maximal file name is
- 14 characters.
-
- We have designed and implemented two new filesystems that are included in the
- standard Linux kernel. These filesystems, called \Extended File System" (Ext fs)
- and \Second Extended File System" (Ext2 fs) raise the limitations and add new
- features.
-
- In this paper, we describe the history of Linux filesystems. We briefly
- introduce the fundamental concepts implemented in Unix filesystems. We present
- the implemen- tation of the Virtual File System layer in Linux and we detail the
- Second Extended File System kernel code and user mode tools. Last, we present
- performance measure- ments made on Linux and BSD filesystems and we conclude
- with the current status
-
- 77
-
- 78
-
- Appendix A. Design and Implementation of the Second Extended Filesystem of
- Ext2fs and the future directions. A.1 History of Linux filesystems In its
- very early days, Linux was cross-developed under the Minix operating system. It
- was easier to share disks between the two systems than to design a new
- filesystem, so Linus Torvalds decided to implement support for the Minix
- filesystem in Linux. The Minix filesystem was an efficient and relatively
- bug-free piece of software.
-
- However, the restrictions in the design of the Minix filesystem were too
- limiting, so people started thinking and working on the implementation of new
- filesystems in Linux.
-
- In order to ease the addition of new filesystems into the Linux kernel, a
- Virtual File System (VFS) layer was developed. The VFS layer was initially
- written by Chris Provenzano, and later rewritten by Linus Torvalds before it was
- integrated into the Linux kernel. It will be described in section A.3 of this
- paper.
-
- After the integration of the VFS in the kernel, a new filesystem, called the
- \Ex- tended File System" was implemented in April 1992 and added to Linux 0.96c.
- This new filesystem removed the two big Minix limitations: its maximal size was
- 2 giga bytes and the maximal file name size was 255 characters. It was an
- improvement over the Minix filesystem but some problems were still present in
- it. There was no support for the separate access, inode modification, and data
- modification timestamps. The filesystem used linked lists to keep track of free
- blocks and inodes and this produced bad performances: as the filesystem was
- used, the lists became unsorted and the filesystem became fragmented.
-
- As a response to these problems, two new filesytems were released in Alpha
- version in January 1993: the Xia filesystem and the Second Extended File System.
- The Xia filesystem was heavily based on the Minix filesystem kernel code and
- only added a few improvements over this filesystem. Basically, it provided long
- file names, support for bigger partitions and support for the three timestamps.
- On the other hand, Ext2fs was based on the Extfs code with many reorganizations
- and many improvements. It had been designed with evolution in mind and contained
- space for future improvements. It will be described with more details in section
- A.4.
-
- When the two new filesystems were first released, they provided essentially
- the same features. Due to its minimal design, Xia fs was more stable than
- Ext2fs. As the filesystems were used more widely, bugs were fixed in Ext2fs and
- lots of improvements
-
- A.2. Basic File System Concepts 79
-
- and new features were integrated. Ext2fs is now very stable and has become the
- de- facto standard Linux filesystem.
-
- The table A.1 contains a summary of the features provided by the different
- filesys- tems.
-
- Table A.1: Summary of the filesystem features
- ____________________________________________________
- | |Minix FS|Ext FS |Ext2 FS |Xia FS |
- |________________|________|_______|________|_______|
- | Max FS size |64 MB | 2 GB | 4 TB | 2 GB |
- | | | | | |
- | Max file size |64|MB | 2 GB | 2 GB | 64 MB |
- | | | | | |
- | Max file name |16/30 c |255 c |255 c |248 c |
- | | | | | |
- | 3 times support|No | No | Yes | Yes |
- | | | | | |
- | Extensible |No | No | Yes | No |
- | | | | | |
- | Var. block size|No | No | Yes | No |
- | | | | | |
- | Maintained | Yes | No | Yes | ? |
- |________________|________|_______|________|_______|
-
-
-
- A.2 Basic File System Concepts
-
-
-
- Every Linux filesystem implements a basic set of common concepts derivated from
- the Unix operating system [2]: files are represented by inodes, directories are
- simply files containing a list of entries and devices can be accessed by
- requesting I/O on special files.
-
- A.2.1 Inodes
-
- Each file is represented by a structure, called an inode. Each inode contains
- the description of the file: file type, access rights, owners, timestamps, size,
- pointers to data blocks. The addresses of data blocks allocated to a file are
- stored in its inode. When a user requests an I/O operation on the file, the
- kernel code converts the current offset to a block number, uses this number as
- an index in the block addresses table and reads or writes the physical block.
- Figure A.1 represents the structure of an inode.
-
- 80 Appendix A. Design and Implementation of the Second Extended Filesystem
-
- Figure A.1: Structure of an inode
- A.2.2 Directories
-
- Directories are structured in a hierarchical tree. Each directory can contain
- files and subdirectories.
-
- Directories are implemented as a special type of files. Actually, a directory
- is a file containing a list of entries. Each entry contains an inode number and
- a file name. When a process uses a pathname, the kernel code searchs in the
- directories to find the corresponding inode number. After the name has been
- converted to an inode number, the inode is loaded into memory and is used by
- subsequent requests.
-
- Figure A.2 represents a directory.
-
- A.2.3 Links
-
- Unix filesystems implement the concept of link. Several names can be
- associated with a inode. The inode contains a field containing the number
- associated with the file. Adding a link simply consists in creating a directory
- entry, where the inode number points to the inode, and in incrementing the links
- count in the inode. When a link is deleted, i.e. when one uses the rm command to
- remove a filename, the kernel decrements the links count and deallocates the
- inode if this count becomes zero.
-
- A.2. Basic File System Concepts 81
- Figure A.2: Structure of a directory
-
- This type of link is called a hard link and can only be used within a single
- filesystem: it is impossible to create cross-filesystem hard links. Moreover,
- hard links can only point on files: a directory hard link cannot be created to
- prevent the apparition of a cycle in the directory tree.
-
- Another kind of links exists in most Unix filesystems. Symbolic links are
- simply files which contain a filename. When the kernel encounters a symbolic
- link during a pathname to inode conversion, it replaces the name of the link by
- its contents, i.e. the name of the target file, and restarts the pathname
- interpretation. Since a symbolic link does not point to an inode, it is
- possible to create cross-filesystems symbolic links. Symbolic links can point to
- any type of file, even on nonexistent files. Symbolic links are very useful
- because they don't have the limitations associated to hard links. However, they
- use some disk space, allocated for their inode and their data blocks, and cause
- an overhead in the pathname to inode conversion because the kernel has to
- restart the name interpretation when it encounters a symbolic link.
-
- A.2.4 Device special files
-
- In Unix-like operating systems, devices can be accessed via special files. A
- device special file does not use any space on the filesystem. It is only an
- access point to the device driver.
-
- Two types of special files exist: character and block special files. The
- former allows I/O operations in character mode while the later requires data to
- be written in block mode via the buffer cache functions. When an I/O request is
- made on a special file, it is forwarded to a (pseudo) device driver. A special
- file is referenced by a major
-
- 82 Appendix A. Design and Implementation of the Second Extended Filesystem
-
- number, which identifies the device type, and a minor number, which identifies
- the unit.
-
- A.3 The Virtual File System
-
- A.3.1 Principle
-
- The Linux kernel contains a Virtual File System layer which is used during
- system calls acting on files. The VFS is an indirection layer which handles the
- file oriented system calls and calls the necessary functions in the physical
- filesystem code to do the I/O.
-
- This indirection mechanism is frequently used in Unix-like operating systems
- to ease the integration and the use of several filesystem types [5, 8].
-
- When a process issues a file oriented system call, the kernel calls a
- function con- tained in the VFS. This function handles the structure independent
- manipulations and redirects the call to a function contained in the physical
- filesystem code, which is responsible for handling the structure dependent
- operations. Filesystem code uses the buffer cache functions to request I/O on
- devices. This scheme is illustrated on figure A.3.
-
- A.3.2 The VFS structure
-
- The VFS defines a set of functions that every filesystem has to implement.
- This inter- face is made up of a set of operations associated to three kinds of
- objects: filesystems, inodes, and open files.
-
- The VFS knows about filesystem types supported in the kernel. It uses a table
- defined during the kernel configuration. Each entry in this table describes a
- filesystem type: it contains the name of the filesystem type and a pointer on a
- function called during the mount operation. When a filesystem is to be mounted,
- the appropriate mount function is called. This function is responsible for
- reading the superblock from the disk, initializing its internal variables, and
- returning a mounted filesystem descriptor to the VFS. After the filesystem is
- mounted, the VFS functions can use this descriptor to access the physical
- filesystem routines.
-
- A mounted filesystem descriptor contains several kinds of data: informations
- that are common to every filesystem types, pointers to functions provided by the
- physical filesystem kernel code, and private data maintained by the physical
- filesystem code.
-
- A.4. The Second Extended File System 83
-
-
- Figure A.3: The VFS Layer The function pointers
- contained in the filesystem descriptors allow the VFS to access the filesystem
- internal routines.
-
- Two other types of descriptors are used by the VFS: an inode descriptor and
- an open file descriptor. Each descriptor contains informations related to files
- in use and a set of operations provided by the physical filesystem code. While
- the inode descriptor contains pointers to functions that can be used to act on
- any file (e.g. create, unlink), the file descriptors contains pointer to
- functions which can only act on open files (e.g. read, write).
-
- A.4 The Second Extended File System
-
- A.4.1 Motivations
-
- The Second Extended File System has been designed and implemented to fix some
- problems present in the first Extended File System. Our goal was to provide a
- pow- erful filesystem, which implements Unix file semantics and offers advanced
- features.
-
- 84 Appendix A. Design and Implementation of the Second Extended Filesystem
-
- Of course, we wanted to Ext2fs to have excellent performance. We also
- wanted to provide a very robust filesystem in order to reduce the risk of data
- loss in intensive use. Last, but not least, Ext2fs had to include provision for
- extensions to allow users to benefit from new features without reformatting
- their filesystem.
-
-
- A.4.2 \Standard" Ext2fs features
-
-
- The Ext2fs supports standard Unix file types: regular files, directories,
- device special files and symbolic links.
-
- Ext2fs is able to manage filesystems created on really big partitions. While
- the original kernel code restricted the maximal filesystem size to 2 GB, recent
- work in the VFS layer have raised this limit to 4 TB. Thus, it is now possible
- to use big disks without the need of creating many partitions.
-
- Ext2fs provides long file names. It uses variable length directory entries.
- The maximal file name size is 255 characters. This limit could be extended to
- 1012 if needed.
-
- Ext2fs reserves some blocks for the super user (root). Normally, 5% of the
- blocks are reserved. This allows the administrator to recover easily from
- situations where user processes fill up filesystems.
-
- A.4.3 \Advanced" Ext2fs features
-
-
- In addition to the standard Unix features, Ext2fs supports some extensions
- which are not usually present in Unix filesystems.
-
- File attributes allow the users to modify the kernel behavior when acting on
- a set of files. One can set attributes on a file or on a directory. In the later
- case, new files created in the directory inherit these attributes.
-
- BSD or System V Release 4 semantics can be selected at mount time. A mount
- option allows the administrator to choose the file creation semantics. On a
- filesystem mounted with BSD semantics, files are created with the same group id
- as their parent directory. System V semantics are a bit more complex: if a
- directory has the setgid bit set, new files inherit the group id of the
- directory and subdirectories inherit the group id and the setgid bit; in the
- other case, files and subdirectories are created with the primary group id of
- the calling process.
-
- BSD-like synchronous updates can be used in Ext2fs. A mount option allows
- the administrator to request that metadata (inodes, bitmap blocks, indirect
- blocks
-
- A.4. The Second Extended File System 85
-
- and directory blocks) be written synchronously on the disk when they are
- modified. This can be useful to maintain a strict metadata consistency but this
- leads to poor performances. Actually, this feature is not normally used, since
- in addition to the performance loss associated with using synchronous updates of
- the metadata, it can cause corruption in the user data which will not be flagged
- by the filesystem checker.
-
- Ext2fs allows the administrator to choose the logical block size when
- creating the filesystem. Block sizes can typically be 1024, 2048 and 4096 bytes.
- Using big block sizes can speed up I/O since fewer I/O requests, and thus fewer
- disk head seeks, need to be done to access a file. On the other hand, big blocks
- waste more disk space: on the average, the last block allocated to a file is
- only half full, so as blocks get bigger, more space is wasted in the last block
- of each file. In addition, most of the advantages of larger block sizes are
- obtained by Ext2 filesystem's preallocation techniques (see section A.4.5).
-
- Ext2fs implements fast symbolic links. A fast symbolic link does not use any
- data block on the filesystem. The target name is not stored in a data block but
- in the inode itself. This policy can save some disk space (no data block needs
- to be allocated) and speeds up link operations (there is no need to read a data
- block when accessing such a link). Of course, the space available in the inode
- is limited so not every link can be implemented as a fast symbolic link. The
- maximal size of the target name in a fast symbolic link is 60 characters. We
- plan to extend this scheme to small files in a near future.
-
- Ext2fs keeps track of the filesystem state. A special field in the superblock
- is used by the kernel code to indicate the status of the file system. When a
- filesystem is mounted in read/write mode, its state is set to \Not Clean". When
- it is unmounted or remounted in read-only mode, its state is reset to \Clean".
- At boot time, the filesystem checker uses this information to decide if a
- filesystem must be checked. The kernel code also records errors in this field.
- When an inconsistency is detected by the kernel code, the filesystem is marked
- as \Erroneous". The filesystem checker tests this to force the check of the
- filesystem regardless of its apparently clean state.
-
- Always skipping filesystem checks may sometimes be dangerous so Ext2fs
- provides two ways to force checks at regular intervals. A mount counter is
- maintained in the superblock. Each time the filesystem is mounted in read/write
- mode, this counter is incremented. When it reaches a maximal value (also
- recorded in the superblock), the filesystem checker forces the check even if the
- filesystem is \Clean". A last check time and a maximal check interval are also
- maintained in the superblock. These two fields allow the administrator to
- request periodical checks. When the maximal check interval has been reached, the
- checker ignores the filesystem state and forces a
-
- 86 Appendix A. Design and Implementation of the Second Extended Filesystem
-
-
- filesystem check.
-
- Ext2fs offers tools to tune the filesystem behavior. The tune2fs program can
- be used to modify: othe error behavior. When an inconsistency is detected by
- the kernel code, the filesystem is marked as \Erroneous" and one of the
- three following actions can be done: continue normal execution, remount the
- filesystem in read-only mode to avoid corrupting the filesystem, make the
- kernel panic and reboot to run the filesystem checker. othe maximal mount
- count. othe maximal check interval. othe number of logical blocks reserved
- for the super user. Mount options can also be used to change the kernel
- error behavior.
-
- An attribute allows the users to request secure deletion on files. When such
- a file is deleted, random data is written in the disk blocks previously
- allocated to the file. This prevents malicious people from gaining access to the
- previous content of the file by using a disk editor.
-
- Last, new types of files inspired from the 4.4 BSD filesystem have recently
- been added to Ext2fs. Immutable files can only be read: nobody can write or
- delete them. This can be used to protect sensitive configuration files.
- Append-only files can be opened in write mode but data is always appended at the
- end of the file. Like immutable files, they cannot be deleted or renamed. This
- is especially useful for log files which can only grow.
-
-
- A.4.4 Physical Structure
-
- The physical structure of Ext2 filesystems has been strongly influenced by the
- layout of the BSD filesystem [6]. A filesystem is made up of block groups. Block
- groups are analogous to BSD FFS's cylinder groups. However, block groups are not
- tied to the physical layout of the blocks on the disk, since modern drives tend
- to be optimized for sequential access and hide their physical geometry to the
- operating system.
-
- The physical structure of a filesystem is represented on figure A.4.
-
- Each block group contains a redundant copy of crucial filesystem control
- infor- mations (superblock and the filesystem descriptors) and also contains a
- part of the filesystem (a block bitmap, an inode bitmap, a piece of the inode
- table, and data blocks). The structure of a block group is represented on figure
- A.5.
- A.4. The Second Extended File System 87
-
-
-
- ______________________________________
- | Boot |Block |Block |... |Block |
- | | | | | |
- | SectorG|roup 1 |Group 2 |...G|roup N |
- |________|_______|________|____|______ |
-
-
-
- Figure A.4: Physical structure of an Ext2 filesystem
-
-
- ___________________________________________________
- | Super |FS desc- |Block |Inode I|node |Data Blocks |
- | | | | | | |
- | Block |riptors B|itmap |Bitmap |Table | |
- |_______|_________|______|_______|______|__________|
-
-
-
- Figure A.5: Structure of a block group
-
-
- Using block groups is a big win in terms of reliability: since the control
- structures are replicated in each block group, it is easy to recover from a
- filesystem where the superblock has been corrupted. This structure also helps to
- get good performances: by reducing the distance between the inode table and the
- data blocks, it is possible to reduce the disk head seeks during I/O on files.
-
- In Ext2fs, directories are managed as linked lists of variable length
- entries. Each entry contains the inode number, the entry length, the file name
- and its length. By using variable length entries, it is possible to implement
- long file names without wasting disk space in directories. The structure of a
- directory entry is shown on
-
- figure A.6.
-
-
- _______________________________________________
- | inode number |entry lengthn|ame lengthf|ilename |
- |______________|_____________|___________|_____ |
-
-
-
- Figure A.6: Structure of a directory entry
-
-
-
- As an example, figure A.7 represents the structure of a directory containing
- three files: file1, long_file_name, and f2.
-
- A.4.5 Performance optimizations
-
- The Ext2fs kernel code contains many performance optimizations, which tend to
- improve I/O speed when reading and writing files.
-
- Ext2fs takes advantage of the buffer cache management by performing
- readaheads: when a block has to be read, the kernel code requests the I/O on
- several contiguous blocks. This way, it tries to ensure that the next block to
- read will already be loaded into the buffer cache. Readaheads are normally
- performed during sequential reads on
- 88 Appendix A. Design and Implementation of the Second Extended Filesystem
-
-
- _____________________________________________________________________
- | i1 |160|5 |file1 || i24|01|4 |long_file_name ||i3 |120|2 |f2 |
- |____|___|__|_________||____|__|__|________________||___|___|__|_____|
-
- Figure A.7: Example of directory
-
-
-
- files and Ext2fs extends them to directory reads, either explicit reads
- (readdir(2) calls) or implicit ones (namei kernel directory lookup).
-
- Ext2fs also contains many allocation optimizations. Block groups are used to
- cluster together related inodes and data: the kernel code always tries to
- allocate data blocks for a file in the same group as its inode. This is intended
- to reduce the disk head seeks made when the kernel reads an inode and its data
- blocks.
-
- When writing data to a file, Ext2fs preallocates up to 8 adjacent blocks when
- allocating a new block. Preallocation hit rates are around 75% even on very
- full filesystems. This preallocation achieves good write performances under
- heavy load. It also allows contiguous blocks to be allocated to files, thus it
- speeds up the future sequential reads.
-
- These two allocation optimizations produce a very good locality of:
- orelated files through block groups orelated blocks through the 8 bits
- clustering of block allocations.
-
- A.5 The Ext2fs library
-
- To allow user mode programs to manipulate the control structures of an Ext2
- filesystem, the libext2fs library was developed. This library provides routines
- which can be used to examine and modify the data of an Ext2 filesystem, by
- accessing the filesystem directly through the physical device.
-
- The Ext2fs library was designed to allow maximal code reuse through the use
- of software abstraction techniques. For example, several different iterators are
- provided. A program can simply pass in a function to ext2fs_block_interate(),
- which will be called for each block in an inode. Another iterator function
- allows an user-provided function to be called for each file in a directory.
-
- Many of the Ext2fs utilities (mke2fs, e2fsck, tune2fs, dumpe2fs, and debugfs)
- use the Ext2fs library. This greatly simplifies the maintainance of these
- utilities, since any changes to reflect new features in the Ext2 filesystem
- format need only be made in one place _ in the Ext2fs library. This code reuse
- also results in smaller binaries, since the Ext2fs library can be built as a
- shared library image.
- A.6. The Ext2fs tools 89
-
- Because the interfaces of the Ext2fs library are so abstract and general, new
- programs which require direct access to the Ext2fs filesystem can very easily be
- written. For example, the Ext2fs library was used during the port of the 4.4BSD
- dump and restore backup utilities. Very few changes were needed to adapt these
- tools to Linux: only a few filesystem dependent functions had to be replaced by
- calls to the Ext2fs library.
-
- The Ext2fs library provides access to several classes of operations. The
- first class are the filesystem-oriented operations. A program can open and close
- a filesystem, read and write the bitmaps, and create a new filesystem on the
- disk. Functions are also available to manipulate the filesystem's bad blocks
- list.
-
- The second class of operations affect directories. A caller of the Ext2fs
- library can create and expand directories, as well as add and remove directory
- entries. Functions are also provided to both resolve a pathname to an inode
- number, and to determine a pathname of an inode given its inode number.
-
- The final class of operations are oriented around inodes. It is possible to
- scan the inode table, read and write inodes, and scan through all of the blocks
- in an inode. Allocation and deallocation routines are also available and allow
- user mode programs to allocate and free blocks and inodes.
-
- A.6 The Ext2fs tools
-
-
-
- Powerful management tools have been developed for Ext2fs. These utilities are
- used to create, modify, and correct any inconsistencies in Ext2 filesystems. The
- mke2fs program is used to initialize a partition to contain an empty Ext2
- filesystem.
-
- The tune2fs program can be used to modify the filesystem parameters. As ex-
- plained in section A.4.3, it can change the error behavior, the maximal mount
- count, the maximal check interval, and the number of logical blocks reserved for
- the super user.
-
- The most interesting tool is probably the filesystem checker. E2fsck is
- intended to repair filesystem inconsistencies after an unclean shutdown of the
- system. The original version of e2fsck was based on Linus Torvald's fsck
- program for the Minix filesystem. However, the current version of e2fsck was
- rewritten from scratch, using the Ext2fs library, and is much faster and can
- correct more filesystem inconsistencies than the original version.
-
- The e2fsck program is designed to run as quickly as possible. Since
- filesystem
- 90 Appendix A. Design and Implementation of the Second Extended Filesystem
-
- checkers tend to be disk bound, this was done by optimizing the algorithms used
- by e2fsck so that filesystem structures are not repeatedly accessed from the
- disk. In addition, the order in which inodes and directories are checked are
- sorted by block number to reduce the amount of time in disk seeks. Many of these
- ideas were originally explored by [3] although they have since been further
- refined by the authors.
-
- In pass 1, e2fsck iterates over all of the inodes in the filesystem and
- performs checks over each inode as an unconnected object in the filesystem.
- That is, these checks do not require any cross-checks to other filesystem
- objects. Examples of such checks include making sure the file mode is legal, and
- that all of the blocks in the inode are valid block numbers. During pass 1,
- bitmaps indicating which blocks and inodes are in use are compiled.
-
- If e2fsck notices data blocks which are claimed by more than one inode, it
- invokes passes 1B through 1D to resolve these conflicts, either by cloning the
- shared blocks so that each inode has its own copy of the shared block, or by
- deallocating one or more of the inodes.
-
- Pass 1 takes the longest time to execute, since all of the inodes have to be
- read into memory and checked. To reduce the I/O time necessary in future passes,
- critical filesystem information is cached in memory. The most important example
- of this technique is the location on disk of all of the directory blocks on the
- filesystem. This obviates the need to re-read the directory inodes structures
- during pass 2 to obtain this information.
-
- Pass 2 checks directories as unconnected objects. Since directory entries do
- not span disk blocks, each directory block can be checked individually without
- reference to other directory blocks. This allows e2fsck to sort all of the
- directory blocks by block number, and check directory blocks in ascending order,
- thus decreasing disk seek time. The directory blocks are checked to make sure
- that the directory entries are valid, and contain references to inode numbers
- which are in use (as determined by pass 1).
-
- For the first directory block in each directory inode, the `.' and `..'
- entries are checked to make sure they exist, and that the inode number for the
- `.' entry matches the current directory. (The inode number for the `..' entry is
- not checked until pass 3.)
-
- Pass 2 also caches information concerning the parent directory in which each
- di- rectory is linked. (If a directory is referenced by more than one directory,
- the second reference of the directory is treated as an illegal hard link, and it
- is removed).
-
- It is noteworthy to note that at the end of pass 2, nearly all of the disk
- I/O which
- A.7. Performance Measurements 91
-
- e2fsck needs to perform is complete. Information required by passes 3, 4 and 5
- are cached in memory; hence, the remaining passes of e2fsck are largely CPU
- bound, and take less than 5-10% of the total running time of e2fsck.
-
- In pass 3, the directory connectivity is checked. E2fsck traces the path of
- each directory back to the root, using information that was cached during pass
- 2. At this time, the `..' entry for each directory is also checked to make sure
- it is valid. Any directories which can not be traced back to the root are linked
- to the /lost+found directory.
-
- In pass 4, e2fsck checks the reference counts for all inodes, by iterating
- over all the inodes and comparing the link counts (which were cached in pass 1)
- against internal counters computed during passes 2 and 3. Any undeleted files
- with a zero link count is also linked to the /lost+found directory during this
- pass.
-
- Finally, in pass 5, e2fsck checks the validity of the filesystem summary
- informa- tion. It compares the block and inode bitmaps which were constructed
- during the previous passes against the actual bitmaps on the filesystem, and
- corrects the on-disk copies if necessary.
-
- The filesystem debugger is another useful tool. Debugfs is a powerful
- program which can be used to examine and change the state of a filesystem.
- Basically, it provides an interactive interface to the Ext2fs library: commands
- typed by the user are translated into calls to the library routines.
-
- Debugfs can be used to examine the internal structures of a filesystem,
- manually repair a corrupted filesystem, or create test cases for e2fsck.
- Unfortunately, this program can be dangerous if it is used by people who do not
- know what they are doing; it is very easy to destroy a filesystem with this
- tool. For this reason, debugfs opens filesytems for read-only access by default.
- The user must explicitly specify the -w flag in order to use debugfs to open a
- filesystem for read/wite access.
-
- A.7 Performance Measurements
-
-
-
- A.7.1 Description of the benchmarks
-
- We have run benchmarks to measure filesystem performances. Benchmarks have been
- made on a middle-end PC, based on a i486DX2 processor, using 16 MB of memory and
- two 420 MB IDE disks. The tests were run on Ext2 fs and Xia fs (Linux 1.1.62)
- and on the BSD Fast filesystem in asynchronous and synchronous mode (FreeBSD 2.0
- Alpha _ based on the 4.4BSD Lite distribution).
- 92 Appendix A. Design and Implementation of the Second Extended Filesystem
-
- We have run two different benchmarks. The Bonnie benchmark tests I/O speed
- on a big file _ the file size was set to 60 MB during the tests. It writes data
- to the file using character based I/O, rewrites the contents of the whole file,
- writes data using block based I/O, reads the file using character I/O and block
- I/O, and seeks into the file. The Andrew Benchmark was developed at Carneggie
- Mellon University and has been used at the University of Berkeley to benchmark
- BSD FFS and LFS. It runs in five phases: it creates a directory hierarchy, makes
- a copy of the data, recursively examine the status of every file, examine every
- byte of every file, and compile several of the files.
-
- A.7.2 Results of the Bonnie benchmark
-
- The results of the Bonnie benchmark are presented in table A.2.
-
-
-
- Table A.2: Results of the Bonnie benchmark
- _____________________________________________________
- | |Char |Block |Rewrite|Char | Block|
- | | | | | | |
- | |Write |Write | | Read | Read |
- | | | | | | |
- | |(KB/s) |(KB/s) |(KB/s) |(KB/s) |(KB/s)|
- |____________|_______|_______|_______|_______|______|
- | BSD Async | 710 | 684 | 401 | 721 | 888 |
- | | | | | | |
- | BSD Sync | 699 | 677 | 400 | 710 | 878 |
- | | | | | | |
- | Ext2 fs |452 |1237 | 536 | 397 |1033 |
- | | | | | | |
- | Xia fs |440 | 704 | 380 | 366 | 895 |
- |____________|_______|_______|_______|_______|______|
- The results are very good in block oriented I/O: Ext2 fs outperforms other
- filesystems. This is clearly a benefit of the optimizations included in the
- allocation routines. Writes are fast because data is written in cluster mode.
- Reads are fast because contiguous blocks have been allocated to the file. Thus
- there is no head seek between two reads and the readahead optimizations can be
- fully used.
-
- On the other hand, performance is better in the FreeBSD operating system in
- character oriented I/O. This is probably due to the fact that FreeBSD and Linux
- do not use the same stdio routines in their respective C libraries. It seems
- that FreeBSD has a more optimized character I/O library and its performance is
- better.
-
- A.7.3 Results of the Andrew benchmark
-
-
-
- The results of the Andrew benchmark are presented in table A.3.
- A.8. Conclusion 93
-
-
-
-
- Table A.3: Results of the Andrew benchmark
- _______________________________________________
- | |P1 | P2 | P3 | P4 | P5 |
- | | | | | | |
- | |Create|Copy |Stat |Grep |Compile|
- | | | | | | |
- | |(ms) |(ms) |(ms) |(ms) |(ms) |
- |____________|______|_____|_____|______|_______|
- | BSD Async | 2203 |7391 |6319 |17466 |75314 |
- | | | | | | |
- | BSD Sync | 2330 |7732 |6317 |17499 |75681 |
- | | | | | | |
- | Ext2 fs |790 |4791 |7235 |11685 |63210 |
- | | | | | | |
- | Xia fs |934 |5402 |8400 |12912 |66997 |
- |____________|______|_____|_____|______|_______|
- The results of the two first passes show that Linux benefits from its
- asynchronous metadata I/O. In passes 1 and 2, directories and files are created
- and BSD syn- chronously writes inodes and directory entries. There is an
- anomaly, though: even in asynchronous mode, the performance under BSD is poor.
- We suspect that the asynchronous support under FreeBSD is not fully implemented.
-
- In pass 3, the Linux and BSD times are very similar. This is a big progress
- against the same benchmark run six months ago. While BSD used to outperform
- Linux by a factor of 3 in this test, the addition of a file name cache in the
- VFS has fixed this performance problem.
-
- In passes 4 and 5, Linux is faster than FreeBSD mainly because it uses an
- unified buffer cache management. The buffer cache space can grow when needed and
- use more memory than the one in FreeBSD, which uses a fixed size buffer cache.
- Comparison of the Ext2fs and Xiafs results shows that the optimizations included
- in Ext2fs are really useful: the performance gain between Ext2fs and Xiafs is
- around 5-10 %.
-
- A.8 Conclusion
-
- The Second Extended File System is probably the most widely used filesystem in
- the Linux community. It provides standard Unix file semantics and advanced
- features. Moreover, thanks to the optimizations included in the kernel code, it
- is robust and offers excellent performance.
-
- Since Ext2fs has been designed with evolution in mind, it contains hooks that
- can be used to add new features. Some people are working on extensions to the
- current filesystem: access control lists conforming to the Posix semantics [7],
- undelete, and on the fly file compression.
- 94 Appendix A. Design and Implementation of the Second Extended Filesystem
-
- Ext2fs was first developed and integrated in the Linux kernel and is now
- actively being ported to other operating systems. An Ext2fs server running on
- top of the GNU Hurd has been implemented. People are also working on an Ext2fs
- port in the LITES server, running on top of the Mach microkernel [1], and in the
- VSTa operating system. Last, but not least, Ext2fs is an important part of the
- Masix operating system [4], currently under development by one of the authors.
-
- Acknowledgments
-
- The Ext2fs kernel code and tools have been written mostly by the authors of this
- paper. Some other people have also contributed to the development of Ext2fs
- either by suggesting new features or by sending patches. We want to thank these
- contributors for their help.
-
-
- Bibliography
-
- [1] M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid A. Tevanian, and
- M. Young. Mach: A New Kernel Foundation For UNIX Development. In Pro-
- ceedings of the USENIX 1986 Summer Conference, June 1986.
-
- [2] M. Bach. The Design of the UNIX Operating System. Prentice Hall, 1986.
-
- [3] E. Bina and P. Emrath. A Faster fsck for BSD Unix. In Proceedings of the
- USENIX Winter Conference, January 1986.
-
- [4] R. Card, E. Commelin, S. Dayras, and F. M'evel. The MASIX Multi-Server
- Operating System. In OSF Workshop on Microkernel Technology for
- Distributed Systems, June 1993.
-
- [5] S. Kleiman. Vnodes: An Architecture for Multiple File System Types in Sun
- UNIX. In Proceedings of the Summer USENIX Conference, pages 260-269, June
- 1986.
-
- [6] M. McKusick, W. Joy, S. Leffler, and R. Fabry. A Fast File System for UNIX.
- ACM Transactions on Computer Systems, 3:181-197, August 1984.
-
- [7] Institute of Electrical and Inc Electronics Engineers. Security interface
- for the portable operating system interface for computer environments -
- draft 13, 1992.
-
- [8] M. Seltzer, K. Bostic, M. McKusick, and C. Staelin. An Implementation of a
- Log-Structured File System for UNIX. In Proceedings of the USENIX Winter
- Conference, January 1993.
-
- [9] A. Tanenbaum. Operating Systems: Design and Implementation. Prentice Hall,
- 1987.
-
-
-
-
- 95
- 96 BIBLIOGRAPHY
-
-
-
-
-
- Appendix B
-
-
-
- Measuring Holes
-
-
-
- This appendix contains the interesting part of the program used to measure the
- potential for holes in a filesystem. The source distribution of the book
- contains the full source code (sag/measure-holes/measure-holes.c).
-
- int process(FILE *f, char *filename) -
- static char *buf = NULL;
- static long prev`block`size = -1;
- long zeroes;
- char *p;
-
- if (buf == NULL __ prev`block`size != block`size) -
- free(buf);
- buf = xmalloc(block`size + 1);
- buf[block`size] = 1;
- prev`block`size = block`size;
- "
- zeroes = 0;
- while (fread(buf, block`size, 1, f) == 1) -
- for (p = buf; *p == '\0'; )
- ++p;
- if (p == buf+block`size)
- zeroes += block`size;
- "
- if (zeroes > 0)
- printf("%ld %s\n", zeroes, filename);
- if (ferror(f)) -
- errormsg(0, -1, "read failed for `%s'", filename);
- return -1;
- "
- return 0;
- "
-
-
-
- 97
- 98 Appendix B. Measuring Holes
-
-
-
-
-
- Appendix C
-
-
-
- The Linux Device List
-
-
-
- This is the device list, maintained by H. Peter Anvin (Peter.Anvin@linux.org),
- at ftp://ftp.yggdrasil.com/pub/device-list/devices.tex. The rest of this text
- is by Peter.
-
- C.1 Introduction
-
- This list is the successor to Rick Miller's Linux Device List, which he stopped
- main- taining when he lost network access in 1993. It is a registry of allocated
- major device numbers, as well as the recommended /dev directory nodes for these
- devices.
-
- This list is available via FTP from ftp.yggdrasil.com in the directory
- /pub/device-list; filename is devices.format where format is txt (ASCII), tex
- (LATEX), dvi (DVI) or ps (PostScript). In cases of discrepancy, the LATEX
- version has priority.
-
- This document is included by reference into the Linux Filesystem Standard
- (FSSTND). The FSSTND is available via FTP from tsx-11.mit.edu in the directory
- /pub/linux/docs/linux-standards/fsstnd.
-
- To have a major number allocated, or a minor number in situations where that
- applies (e.g. busmice), please contact me. Also, if you have additional
- information regarding any of the devices listed below, I would like to know.
-
- Allocations marked (68k) apply to Linux/68k only.
- 99
- 100 Appendix C. The Linux Device List
-
-
-
- C.2 Major numbers
-
-
- 0 Unnamed devices (NFS mounts, loopback devices)
- 1 char Memory devices
- block RAM disk
-
- 2 char Reserved for PTY's <tytso@athena.mit.edu>
- block Floppy disks
-
- 3 char Reserved for PTY's <tytso@athena.mit.edu>
- block First MFM, RLL and IDE hard disk/CD-ROM interface
-
- 4 char TTY devices
-
- 5 char Alternate TTY devices
-
- 6 char Parallel printer devices
-
- 7 char Virtual console access devices
-
- 8 block SCSI disk devices
-
- 9 char SCSI tape devices
- block Multiple disk devices
-
- 10 char Non-serial mice, misc features
-
- 11 block SCSI CD-ROM devices
-
- 12 char QIC-02 tape
- block MSCDEX CD-ROM callback support
-
- 13 char PC speaker
- block 8-bit MFM/RLL/IDE controller
-
- 14 char Sound card
- block BIOS harddrive callback support
-
- 15 char Joystick
- block Sony CDU-31A/CDU-33A CD-ROM
-
- 16 char Reserved for scanners
- block GoldStar CD-ROM
-
- 17 char Chase serial card (Under development)
- block Optics Storage CD-ROM (Under development)
-
- 18 char Chase serial card - alternate devices
- block Sanyo CD-ROM (Under development)
-
- 19 char Cyclades serial card
- block Double compressed disk
-
- 20 char Cyclades serial card - alternate devices
- block Hitachi CD-ROM (Under development)
-
- 21 char Generic SCSI access
-
- 22 char Digiboard serial card
- C.3. Minor numbers 101
-
-
-
- block Second MFM, RLL and IDE hard disk/CD-ROM interface
-
- 23 char Digiboard serial card - alternate devices
- block Mitsumi proprietary CD-ROM
-
- 24 char Stallion serial card
- block Sony CDU-535 CD-ROM
-
- 25 char Stallion serial card - alternate devices
- block First Matsushita (Panasonic/SoundBlaster) CD-ROM
-
- 26 block Second Matsushita (Panasonic/SoundBlaster) CD-ROM
-
- 27 char QIC-117 tape
- block Third Matsushita (Panasonic/SoundBlaster) CD-ROM
-
- 28 char Stallion serial card - card programming
- block Fourth Matsushita (Panasonic/SoundBlaster) CD-ROM
- block ACSI disk (68k)
-
- 29 char Universal frame buffer
- block Aztech/Orchid/Okano/Wearnes CD-ROM
-
- 30 char iBCS-2
- block Philips LMS-205 CD-ROM
-
- 31 char MPU-401 MIDI
- block ROM/flash memory card
-
- 32 block Philips LMS-206 CD-ROM
-
- 33 block Modular RAM disk
-
- 34-223 Unallocated
-
- 224-254 Local use
-
- 255 Reserved
-
-
-
- C.3 Minor numbers
-
- 0 Unnamed devices (NFS mounts, loopback devices)
- 0 reserved as null device number
-
- 1 char Memory devices
- 1 /dev/mem Physical memory access
- 2 /dev/kmem Kernel virtual memory access
- 3 /dev/null Null device
- 4 /dev/port I/O port access
- 102 Appendix C. The Linux Device List
-
- 5 /dev/zero Null byte source
- 6 /dev/core OBSOLETE - should be a link to /proc/kcore
- 7 /dev/full Returns ENOSPC on write
-
- block RAM disk
-
- 1 /dev/ramdisk RAM disk
-
- 2 char Reserved for PTY's <tytso@athena.mit.edu>
-
- block Floppy disks
- 0 /dev/fd0 Controller 1, drive 1 autodetect
- 1 /dev/fd1 Controller 1, drive 2 autodetect
- 2 /dev/fd2 Controller 1, drive 3 autodetect
- 3 /dev/fd3 Controller 1, drive 4 autodetect
- 128/dev/fd4 Controller 2, drive 1 autodetect
- 129/dev/fd5 Controller 2, drive 2 autodetect
- 130/dev/fd6 Controller 2, drive 3 autodetect
- 131/dev/fd7 Controller 2, drive 4 autodetect
-
- To specify format, add to the autodetect device number
- 0 /dev/fd? Autodetect format
- 4 /dev/fd?d360 5.25" 360K in a 360K drive1
- 20/dev/fd?h360 5.25" 360K in a 1200K drive1
- 48/dev/fd?h410 5.25" 410K in a 1200K drive
- 64/dev/fd?h420 5.25" 420K in a 1200K drive
- 24/dev/fd?h720 5.25" 720K in a 1200K drive
- 80/dev/fd?h880 5.25" 880K in a 1200K drive1
- 8 /dev/fd?h1200 5.25" 1200K in a 1200K drive1
- 40/dev/fd?h1440 5.25" 1440K in a 1200K drive1
- 56/dev/fd?h1476 5.25" 1476K in a 1200K drive
- 72/dev/fd?h1494 5.25" 1494K in a 1200K drive
- 92/dev/fd?h1600 5.25" 1600K in a 1200K drive1
-
- 12/dev/fd?u360 3.5" 360K Double Density
- 16/dev/fd?u720 3.5" 720K Double Density1
- 120/dev/fd?u800 3.5" 800K Double Density2
- 52/dev/fd?u820 3.5" 820K Double Density
- 68/dev/fd?u830 3.5" 830K Double Density
- C.3. Minor numbers 103
-
- 84 /dev/fd?u1040 3.5" 1040K Double Density1
- 88 /dev/fd?u1120 3.5" 1120K Double Density1
- 28 /dev/fd?u1440 3.5" 1440K High Density1
- 124 /dev/fd?u1600 3.5" 1600K High Density1
- 44 /dev/fd?u1680 3.5" 1680K High Density3
- 60 /dev/fd?u1722 3.5" 1722K High Density
- 76 /dev/fd?u1743 3.5" 1743K High Density
- 96 /dev/fd?u1760 3.5" 1760K High Density
- 116 /dev/fd?u1840 3.5" 1840K High Density3
- 100 /dev/fd?u1920 3.5" 1920K High Density1
- 32 /dev/fd?u2880 3.5" 2880K Extra Density1
- 104 /dev/fd?u3200 3.5" 3200K Extra Density
- 108 /dev/fd?u3520 3.5" 3520K Extra Density
- 112 /dev/fd?u3840 3.5" 3840K Extra Density1
-
- 36 /dev/fd?CompaQ Compaq 2880K drive; probably obsolete
-
- 1 Autodetectable format
- 2 Autodetectable format in a Double Density (720K) drive only
- 3 Autodetectable format in a High Density (1440K) drive only
-
- NOTE: The letter in the device name (d, q, h or u) signifies the type of
- drive supported: 5.25" Double Density (d), 5.25" Quad Density (q), 5.25" High
- Density (h) or 3.5" (any type, u). The capital letters D, H, or E for the 3.5"
- models have been deprecated, since the drive type is insignificant for these
- devices.
-
-
- 3 char Reserved for PTY's <tytso@athena.mit.edu>
-
- block First MFM, RLL and IDE hard disk/CD-ROM interface
- 0 /dev/hda Master: whole disk (or CD-ROM)
- 64 /dev/hdb Slave: whole disk (or CD-ROM)
-
- For partitions, add to the whole disk device number
- 0 /dev/hd? Whole disk
- 1 /dev/hd?1 First primary partition
- 2 /dev/hd?2 Second primary partition
- 3 /dev/hd?3 Third primary partition
- 4 /dev/hd?4 Fourth primary partition104 Appendix C. The Linux Device List
-
-
- 5 /dev/hd?5 First logical partition
- 6 /dev/hd?6 Second logical partition
- 7 /dev/hd?7 Third logical partition
- : : :
- 63/dev/hd?63 59th logical partition
-
- 4 char TTY devices
-
- 0 /dev/console Console device
- 1 /dev/tty1 First virtual console
- : : :
- 63/dev/tty63 63rd virtual console
- 64/dev/ttyS0 First serial port
- : : :
- 127/dev/ttyS63 64th serial port
- 128/dev/ptyp0 First pseudo-tty master
- : : :
- 191/dev/ptysf 64th pseudo-tty master
- 192/dev/ttyp0 First pseudo-tty slave
- : : :
- 255/dev/ttysf 64th pseudo-tty slave
-
- Pseudo-tty's are named as follows:
-
- o Masters are pty, slaves are tty;
-
- o the fourth letter is one of pqrs indicating the 1st, 2nd, 3rd, 4th series of
- 16 pseudo-ttys each, and
-
- o the fifth letter is one of 0123456789abcdef indicating the position within
- the series.
-
-
- 5 char Alternate TTY devices
- 0 /dev/tty Current TTY device
- 64/dev/cua0 Callout device corresponding to ttyS0
- : : :
- 127/dev/cua63 Callout device corresponding to ttyS63
- C.3. Minor numbers 105
-
- 6 char Parallel printer devices
-
- 0 /dev/lp0 First parallel printer (0x3bc)
- 1 /dev/lp1 Second parallel printer (0x378)
- 2 /dev/lp2 Third parallel printer (0x278)
-
- Not all computers have the 0x3bc parallel port, hence the "first" printer may be
- either /dev/lp0 or /dev/lp1.
-
-
- 7 char Virtual console access devices
-
- 0 /dev/vcs Current vc text access
- 1 /dev/vcs1 tty1 text access
- : : :
- 63 /dev/vcs63 tty63 text access
- 128 /dev/vcsa Current vc text/attribute access
- 129 /dev/vcsa1 tty1 text/attribute access
- : : :
- 191 /dev/vcsa63 tty63 text/attribute access
-
- NOTE: These devices permit both read and write access.
-
- 8 block SCSI disk devices
-
- 0 /dev/sda First SCSI disk whole disk
- 16 /dev/sdb Second SCSI disk whole disk
- 32 /dev/sdc Third SCSI disk whole disk
- : : :
- 240 /dev/sdp Sixteenth SCSI disk whole disk
-
- Partitions are handled in the same way as for IDE disks (see major number 3)
- except that the limit on logical partitions is 11 rather than 59 per disk.
-
-
- 9 char SCSI tape devices
-
- 0 /dev/st0 First SCSI tape
- 1 /dev/st1 Second SCSI tape
- : : :
- 128 /dev/nst0 First SCSI tape, no rewind-on-close
- 129 /dev/nst1 Second SCSI tape, no rewind-on-close
- 106 Appendix C. The Linux Device List
-
-
- : : :
-
- block Multiple disk devices
-
- 0 /dev/md0 First device group
- 1 /dev/md1 Second device group
- : : :
-
- The multiple device driver is used to span a filesystem across multiple physical
- disks.
-
- 10 char Non-serial mice, misc features
-
- 0 /dev/logibm Logitech bus mouse
- 1 /dev/psaux PS/2-style mouse port
- 2 /dev/inportbm Microsoft Inport bus mouse
- 3 /dev/atibm ATI XL bus mouse
- 4 /dev/jbm J-mouse
- 4 /dev/amigamouse Amiga Mouse (68k)
- 5 /dev/atarimouse Atari Mouse (68k)
- 128/dev/beep Fancy beep device
- 129/dev/modreq Kernel module load request
-
- 11 block SCSI CD-ROM devices
-
- 0 /dev/sr0 First SCSI CD-ROM
- 1 /dev/sr1 Second SCSI CD-ROM
- : : :
-
- The prefix /dev/scd instead of /dev/sr has been used as well, and might make
- more sense.
-
-
- 12 char QIC-02 tape
-
- 2 /dev/ntpqic11 QIC-11, no rewind-on-close
- 3 /dev/tpqic11 QIC-11, rewind-on-close
- 4 /dev/ntpqic24 QIC-24, no rewind-on-close
- 5 /dev/tpqic24 QIC-24, rewind-on-close
- 6 /dev/ntpqic120 QIC-120, no rewind-on-close
- 7 /dev/tpqic120 QIC-120, rewind-on-close
- 8 /dev/ntpqic150 QIC-150, no rewind-on-close
- C.3. Minor numbers 107
-
- 9 /dev/tpqic150 QIC-150, rewind-on-close
-
- The device names specified are proposed - if there are \standard" names for
- these devices, please let me know.
-
-
- block MSCDEX CD-ROM callback support
-
- 0 /dev/dos_cd0 First MSCDEX CD-ROM
- 1 /dev/dos_cd1 Second MSCDEX CD-ROM
- : : :
-
- 13 char PC speaker
-
- 0 /dev/pcmixer Emulates /dev/mixer
- 3 /dev/pcsp Emulates /dev/dsp (8-bit)
- 4 /dev/pcaudio Emulates /dev/audio
- 5 /dev/pcsp16 Emulates /dev/dsp (16-bit)
-
- block 8-bit MFM/RLL/IDE controller
-
- 0 /dev/xda First XT disk whole disk
- 64 /dev/xdb Second XT disk whole disk
-
- Partitions are handled in the same way as IDE disks (see major number 3).
-
- 14 char Sound card
-
- 0 /dev/mixer Mixer control
- 1 /dev/sequencer Audio sequencer
- 2 /dev/midi00 First MIDI port
- 3 /dev/dsp Digital audio
- 4 /dev/audio Sun-compatible digital audio
- 6 /dev/sndstat Sound card status information
- 8 /dev/sequencer2 Sequencer - alternate device
- 16 /dev/mixer1 Second soundcard mixer control
- 17 /dev/patmgr0 Sequencer patch manager
- 18 /dev/midi01 Second MIDI port
- 19 /dev/dsp1 Second soundcard digital audio
- 20 /dev/audio1 Second soundcard Sun digital audio
- 33 /dev/patmgr1 Sequencer patch manager
- 108 Appendix C. The Linux Device List
-
- 34/dev/midi02 Third MIDI port
- 50/dev/midi03 Fourth MIDI port
-
- block BIOS harddrive callback support
-
- 0 /dev/dos_hda First BIOS harddrive whole disk
- 64/dev/dos_hdb Second BIOS harddrive whole disk
- 128/dev/dos_hdc Third BIOS harddrive whole disk
- 192/dev/dos_hdd Fourth BIOS harddrive whole disk
-
- Partitions are handled in the same way as IDE disks (see major number 3).
-
- 15 char Joystick
-
- 0 /dev/js0 First joystick
- 1 /dev/js1 Second joystick
-
- block Sony CDU-31A/CDU-33A CD-ROM
-
- 0 /dev/sonycd Sony CDU-31A CD-ROM
-
- 16 char Reserved for scanners
-
- block GoldStar CD-ROM
-
- 0 /dev/gscd GoldStar CD-ROM
-
- 17 char Chase serial card (Under development)
-
- 0 /dev/ttyH0 First Chase port
- 1 /dev/ttyH1 Second Chase port
- : : :
-
-
-
- block Optics Storage CD-ROM (Under development)
-
- 0 /dev/optcd Optics Storage CD-ROM
-
- 18 char Chase serial card - alternate devices
-
- 0 /dev/cuh0 Callout device corresponding to ttyH0
- 1 /dev/cuh1 Callout device corresponding to ttyH1
- : : :
- C.3. Minor numbers 109
-
-
-
- block Sanyo CD-ROM (Under development)
-
- 0 ? Sanyo CD-ROM
-
- 19 char Cyclades serial card
-
- 32 /dev/ttyC0 First Cyclades port
- : : :
- 63 /dev/ttyC31 32nd Cyclades port
-
- It would make more sense for these to start at 0...
-
- block \Double" compressed disk
-
- 0 /dev/double0 First compressed disk
- : : :
- 7 /dev/double7 Eighth compressed disk
- 128 /dev/cdouble0 Mirror of first compressed disk
- : : :
- 135 /dev/cdouble7 Mirror of eighth compressed disk
-
- See the Double documentation for an explanation of the \mirror" devices.
-
- 20 char Cyclades serial card - alternate devices
-
- 32 /dev/cub0 Callout device corresponding to ttyC0
- : : :
- 63 /dev/cub31 Callout device corresponding to ttyC31
-
- block Hitachi CD-ROM (Under development)
-
- 0 /dev/hitcd Hitachi CD-ROM
-
- 21 char Generic SCSI access
-
- 0 /dev/sg0 First generic SCSI device
- 1 /dev/sg1 Second generic SCSI device
- : : :
-
- 22 char Digiboard serial card
- 110 Appendix C. The Linux Device List
-
- 0 /dev/ttyD0 First Digiboard port
- 1 /dev/ttyD1 Second Digiboard port
- : : :
-
- block Second MFM, RLL and IDE hard disk/CD-ROM interface
-
- 0 /dev/hdc Master: whole disk (or CD-ROM)
- 64/dev/hdd Slave: whole disk (or CD-ROM)
-
- Partitions are handled the same way as for the first interface (see major number
- 3).
-
- 23 char Digiboard serial card - alternate devices
-
- 0 /dev/cud0 Callout device corresponding to ttyD0
- 1 /dev/cud1 Callout device corresponding to ttyD1
- : : :
-
- block Mitsumi proprietary CD-ROM
-
- 0 /dev/mcd Mitsumi CD-ROM
-
- 24 char Stallion serial card
-
- 0 /dev/ttyE0 Stallion port 0 board 0
- 1 /dev/ttyE1 Stallion port 1 board 0
- : : :
- 64/dev/ttyE64 Stallion port 0 board 1
- 65/dev/ttyE65 Stallion port 1 board 1
- : : :
- 128/dev/ttyE128 Stallion port 0 board 2
- 129/dev/ttyE129 Stallion port 1 board 2
- : : :
- 192/dev/ttyE192 Stallion port 0 board 3
- 193/dev/ttyE193 Stallion port 1 board 3
- : : :
-
- block Sony CDU-535 CD-ROM
-
- 0 /dev/cdu535 Sony CDU-535 CD-ROM
-
- 25 char Stallion serial card - alternate devices
-
- 0 /dev/cue0 Callout device corresponding to ttyE0
- 1 /dev/cue1 Callout device corresponding to ttyE1
- C.3. Minor numbers 111
-
- : : :
- 64 /dev/cue64 Callout device corresponding to ttyE64
- 65 /dev/cue65 Callout device corresponding to ttyE65
- : : :
- 128 /dev/cue128 Callout device corresponding to ttyE128
- 129 /dev/cue129 Callout device corresponding to ttyE129
- : : :
- 192 /dev/cue192 Callout device corresponding to ttyE192
- 193 /dev/cue193 Callout device corresponding to ttyE193
- : : :
-
- block First Matsushita (Panasonic/SoundBlaster) CD-ROM
-
- 0 /dev/sbpcd0 Panasonic CD-ROM controller 0 unit 0
- 1 /dev/sbpcd1 Panasonic CD-ROM controller 0 unit 1
- 2 /dev/sbpcd2 Panasonic CD-ROM controller 0 unit 2
- 3 /dev/sbpcd3 Panasonic CD-ROM controller 0 unit 3
-
- 26 char Frame grabbers
-
- 0 /dev/wvisfgrab Quanta WinVision frame grabber
-
- block Second Matsushita (Panasonic/SoundBlaster) CD-ROM
-
- 0 /dev/sbpcd4 Panasonic CD-ROM controller 1 unit 0
- 1 /dev/sbpcd5 Panasonic CD-ROM controller 1 unit 1
- 2 /dev/sbpcd6 Panasonic CD-ROM controller 1 unit 2
- 3 /dev/sbpcd7 Panasonic CD-ROM controller 1 unit 3
-
- 27 char QIC-117 tape
-
- 0 /dev/rft0 Unit 0, rewind-on-close
- 1 /dev/rft1 Unit 1, rewind-on-close
- 2 /dev/rft2 Unit 2, rewind-on-close
- 3 /dev/rft3 Unit 3, rewind-on-close
- 4 /dev/nrft0 Unit 0, no rewind-on-close
- 5 /dev/nrft1 Unit 1, no rewind-on-close
- 6 /dev/nrft2 Unit 2, no rewind-on-close
- 7 /dev/nrft3 Unit 3, no rewind-on-close
- 112 Appendix C. The Linux Device List
-
-
-
- block Third Matsushita (Panasonic/SoundBlaster) CD-ROM
-
- 0 /dev/sbpcd8 Panasonic CD-ROM controller 2 unit 0
- 1 /dev/sbpcd9 Panasonic CD-ROM controller 2 unit 1
- 2 /dev/sbpcd10 Panasonic CD-ROM controller 2 unit 2
- 3 /dev/sbpcd11 Panasonic CD-ROM controller 2 unit 3
-
- 28 char Stallion serial card - card programming
-
- 0 /dev/staliomem0 First Stallion I/O card memory
- 1 /dev/staliomem1 Second Stallion I/O card memory
- 2 /dev/staliomem2 Third Stallion I/O card memory
- 3 /dev/staliomem3 Fourth Stallion I/O card memory
-
- block Fourth Matsushita (Panasonic/SoundBlaster) CD-ROM
-
- 0 /dev/sbpcd12 Panasonic CD-ROM controller 3 unit 0
- 1 /dev/sbpcd13 Panasonic CD-ROM controller 3 unit 1
- 2 /dev/sbpcd14 Panasonic CD-ROM controller 3 unit 2
- 3 /dev/sbpcd15 Panasonic CD-ROM controller 3 unit 3
-
- block ACSI disk (68k)
-
- 0 /dev/ada First ACSI disk whole disk
- 16/dev/adb Second ACSI disk whole disk
- 32/dev/adc Third ACSI disk whole disk
- : : :
- 240/dev/adp Sixteenth ACSI disk whole disk
-
- Partitions are handled in the same way as for IDE disks (see major number 3)
- except that the limit on logical partitions is 11 rather than 59 per disk.
-
- 29 char Universal frame buffer
-
- 0 /dev/fb0current First frame buffer
- 1 /dev/fb0autodetect
- : : :
- 16/dev/fb1current Second frame buffer
- 17/dev/fb1autodetect
- : : :
- C.3. Minor numbers 113
-
-
-
- The universal frame buffer device is currently supported only on Linux/68k. The
- current device accesses the frame buffer at current resolution; the autodetect
- one at bootup (default) resolution. Minor numbers 2-15 within each frame buffer
- assignment are used for specific device-dependent resolutions. There appears to
- be no standardnaming for these devices.
-
- block Aztech/Orchid/Okano/Wearnes CD-ROM
-
- 0 /dev/aztcd Aztech CD-ROM
-
- 30 char iBCS-2 compatibility devices
-
- 0 /dev/socksys Socket access
- 1 /dev/spx SVR3 local X interface
- 2 /dev/inet/arp Network access
- 2 /dev/inet/icmp Network access
- 2 /dev/inet/ip Network access
- 2 /dev/inet/udp Network access
- 2 /dev/inet/tcp Network access
-
- iBCS-2 requires /dev/nfsd to be a link to /dev/socksys and /dev/X0R to be a link
- to /dev/null.
-
- block Philips LMS CM-205 CD-ROM
-
- 0 /dev/cm205cd Philips LMS CM-205 CD-ROM
-
- /dev/lmscd is an older name for this drive. This driver does not work with the
- CM-205MS CD-ROM.
-
- 31 char MPU-401 MIDI
-
- 0 /dev/mpu401data MPU-401 data port
- 1 /dev/mpu401stat MPU-401 status port
-
- block ROM/flash memory card
-
- 0 /dev/rom0 First ROM card (rw)
- : : :
- 7 /dev/rom7 Eighth ROM card (rw)
- 8 /dev/rrom0 First ROM card (ro)
- 114 Appendix C. The Linux Device List
-
- : : :
- 15/dev/rrom0 Eighth ROM card (ro)
- 16/dev/flash0 First flash memory card (rw)
- : : :
- 23/dev/flash7 Eighth flash memory card (rw)
- 24/dev/rflash0 First flash memory card (ro)
- : : :
- 31/dev/rflash7 Eighth flash memory card (ro)
-
- The read-write (rw) devices support back-caching written data in RAM, as well
- as writing to flash RAM devices. The read-only devices (ro) support reading
- only.
-
- 32 block Philips LMS CM-206 CD-ROM
-
- 0 /dev/cm206cd Philips LMS CM-206 CD-ROM
-
- 33 block Modular RAM disk
-
- 0 /dev/ram0 First modular RAM disk
- 1 /dev/ram1 Second modular RAM disk
- : : :
- 255/dev/ram255 256th modular RAM disk
-
- 34-223 Unallocated
-
- 224 -254 Local/experimental use
-
- For devices not assigned official numbers, this range should be used, in
- order to avoid conflict with future assignments. Please note that MAX_CHRDEV
- and MAX_BLKDEV in linux/include/linux/major.h must be set to a value greater
- than the highest used major number. For a kernel using local/experimental
- devices, it is probably easiest to set both of these equal to 256. The memory
- cost above using the default value of 64 is 3K.
-
- 255 Reserved
- C.4. Additional /dev directory entries
- 115
-
- C.4 Additional /dev directory entries
-
- This section details additional entries that should or may exist in the /dev
- directory. It is preferred that symbolic links use the same form (absolute or
- relative) as is indicated here. Links are classified as hard or symbolic
- depending on the preferredtype of link; if possible, the indicated type of link
- should be used.
- C.4.1 Compulsory links
-
- These links should exist on all systems:
-
- /dev/fd /proc/self/fd symbolic File descriptors
- /dev/stdin fd/0 symbolic Standard input file descriptor
- /dev/stdout fd/1 symbolic Standard output file descriptor
- /dev/stderr fd/2 symbolic Standard error file descriptor
-
- C.4.2 Recommended links
-
- It is recommended that these links exist on all systems:
-
- /dev/X0R null symbolic Used by iBCS-2
- /dev/nfsd socksys symbolic Used by iBCS-2
- /dev/core /proc/kcore symbolic Backward compatibility
- /dev/scd? sr? hard Alternate name for CD-ROMs
-
- C.4.3 Locally defined links
-
- The following links may be established locally to conform to the configuration
- of the system. This is merely a tabulation of existing practice, and does not
- constitute are commendation. However, if they exist, they should have the
- following uses.
-
- /dev/mouse mouse port symbolic Current mouse device
- /dev/tape tape device symbolic Current tape device
- /dev/cdrom CD-ROM device symbolic Current CD-ROM device
- /dev/modem modem port symbolic Current dialout device
- /dev/root root device symbolic Current root filesystem
- 116 Appendix C. The Linux Device List
-
- /dev/swap swap device symbolic Current swap device
-
- /dev/modem should not be used for a modem which supports dialin as well as
- dialout, as it tends to cause lock file problems. If it exists, /dev/modem
- should point to theappropriate dialout (alternate) device.
- C.4.4 Sockets and pipes
-
- Non-transient sockets or named pipes may exist in /dev. Common entries are:
-
- /dev/printer socket lpd local socket
- /dev/log socket syslog local socket
-
-
- Bibliography
-
-
- [Car95] R'emy Card. The second extended filesystem: current state,
- future development, 1995. Slides used during presentation at the
- Second International Linux and Internet Conference, in Berlin, May
- 1995. Available via anonymous FTP from ftp.ibp.fr, in the
- directory /pub/linux/packages/ext2fs/slides/berlin.
-
- [NSS89] Evi Nemeth, Garth Snyder, and Scott Seebass. UNIX System
- Administration Handbook. Prentice-Hall, 1989. From Anonymous: I
- haven't seen any others to compare this one to, so I don't know that
- I'd particularly recommend it. It does cover both BSD and SYSV,
- though, so it might be more useful to a Linux sysadmin than a single
- book that focussed on BSD or SYSV exclusively.
-
- [POL93] Jerry Peek, Tim O'Reilly, and Mike Loukide. UNIX Power Tools. Bantam,
- 1993. From Anonymous: Not a comprehensive guide to much of anything,
- but it does include a LOT of hints and tips at the sysadmin level.
- This comes with a CD-ROM full of useful Unix programs, too.
-
- [Qui95] Daniel Quinlan. Linux Filesystem Structure_Release 1.2, March 1995. A
- description of and a proposal for a standard Linux directory tree,
- with the intention is to make it easier to package software and
- administer Linux systems by making files appear in standard places.
- Follows fairly closely traditional Unix practice, and has got support
- from most Linux distributions. Available via FTP from ftp.funet.fi,
- directory /pub/Linux/doc/fsstnd.
-
- [Ray91] Eric Raymond, editor. The New Hacker's Dictionary. MIT Press, 1991. A
- dictionary of the slang and jargon used by hackers. A book version of
- the Jargon File, which contains all the text of the book (typically in
- a more up-to-date form), and which is in the public domain.
-
-
-
- 117